Repository for the invasive species biocontrol quantification project as part of the AI & Ecology Course field component in HI. The data associated with this project is available on Hugging Face: Imageomics/invasive_plants_hawaii.
Invasive species are non-native organisms to an ecosystem that cause economic, environmental, and/or human harm. A soapbush (Clidemia Hirta) is such an invasive species forming dense thickets smothering other native plant life in Hawaii. Several biocontrols were released to control the spread of the soapbush, but quantifying the effect of such a release is difficult due to time consuming manual inspection. To alleviate these efforts, we collected a dataset of individual leaves from Clidemia Hirta bushes which were then photographed and labeled according to their damage type. We implemented baselines for automatic damage classification, detection, and segmentation for individual leaves.
Install uv if not already done, and run the following command:
uv sync
The classification baseline results can be found in src/inv_plts/multilabel_classification.
To train a classification baseline model run the following command
uv run python -m inv_plts.multilabel_classification.train --model resnet50
More information on command line arguments can be found in src/inv_plts/multilabel_classification.config.py
More examples of these commands can be found in scripts/train_classifiers.sh
To evaluate a classification baseline model run the following command
uv run python -m inv_plts.multilabel_classification.basic_evaluate --model resnet50
More information on command line arguments can be found in src/inv_plts/multilabel_classification.basic_evalute.py
More examples of these commands can be found in scripts/evaluate_classifiers.sh
We report classification results for the following models: ViT, Resnet, ConvNext, CvT, MaxViT
The Faster R-CNN baseline can be ran with the following notebook: notebooks/rcnn_baseline/FasterRCNN_pipeline.ipynb. The training and test splits are also given in data/rcnn_cropped_annotations_[test,train].csv. The archived images used to produce these results is available in the images folder of our HuggingFace Repository (filename : rcnn_images.zip).
The code for running Molmo and SAM2 are in the corresponding notebooks in this directory: notebooks/zero_shot_segmentation_pipeline. Some example image names used for the qualitative analysis is provided in data/sample_images.csv. First, we need to run the molmo notebook notebooks/zero_shot_segmentation_pipeline/run_molmo.ipynb to generate points. Then, we need to run the sam2 notebook notebooks/zero_shot_segmentation_pipeline/run_sam2_and_visualize.ipynb to generate segmentation masks and visualize on images.
The code for our Leaf Reconstruction baseline is available in the scripts/leaf_reconstruction. It contains the code to reproduce the results shown in our report (scripts/leaf_reconstruction/get_absolute_error.py), as well as the code to compute the ground truth damage ratio (scripts/leaf_reconstruction/damage_calculation_ground_truth.py). The code for testing HerbiEstim on a set of images can be found in the original repository of the authors here. The archived images used to produce our results is available in the images folder of our HuggingFace Repository (filename : Dataset_Leaf_Reconstruction.zip).
This work was supported by both the Imageomics Institute and the AI and Biodiversity Change (ABC) Global Center. The Imageomics Institute is funded by the US National Science Foundation's Harnessing the Data Revolution (HDR) program under Award #2118240 (Imageomics: A New Frontier of Biological Information Powered by Knowledge-Guided Machine Learning). The ABC Global Center is funded by the US National Science Foundation under Award No. 2330423 and Natural Sciences and Engineering Research Council of Canada under Award No. 585136. This code draws on research funded by the Social Sciences and Humanities Research Council. Any opinions, findings, conclusions, or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation, the Natural Sciences and Engineering Research Council of Canada, or the Social Sciences and Humanities Research Council.
We acknowledge the support of the U.S. Forest Service, more specifically the researchers associated with the Pacific Southwest Research Station (Institute of Pacific Islands Forestry), for their support and collaboration in this research. Data collection and protocols were notably done in close collaboration with Dr. Ellyn Bitume, a research entomologist for the U.S. Forest Service. We also acknowledge the support of the National Ecological Observatory Network (NEON), a program sponsored by the U.S. National Science Foundation (NSF) and operated under cooperative agreement by Battelle. We especially thank the team associated with the Pu'u Maka'ala Natural Area Reserve, for helping us with logistics and equipment.
Our code (this repository):
@software{invasive_species_code,
author = {David Carlyn and Catherine Villeneuve and Kazi Sajeed Mehrab and Leonardo Viotti},
title = {Invasives Species Biocontrol Quantification},
version = {v1.0.0},
year = {2025}
}
The dataset:
@dataset{invasive_plants_hawaii_dataset,
author = {David Edward Carlyn and Catherine Villeneuve and Leonardo Viotti and Kazi Sajeed Mehrab
and Ellyn Bitume and Chuck Stewart and Leanna House},
title = {Hawaii Leaf Damage Dataset},
year = {2025},
url = {https://huggingface.co/datasets/imageomics/invasive_plants_hawaii},
publisher = {Hugging Face}
}