Segmentation Study Documentation

1. Overview

This repository contains a complete comparative study of modern image segmentation models on field images of three cereal crops: wheat spikes, sorghum panicles, and corn ears. The aim is to quantify and compare model performance across:

Different crops
Multiple genotypes within each crop
Multiple collection dates
Cross-date and cross-genotype generalization

The study is implemented with a combination of Python (for model training and evaluation) and R (for statistical analysis of results).

2. Objectives

The main objective is to compare the outputs and performance of several state-of-the-art segmentation models (UNet, SegNet, DeepLabV3+, SegFormer, MaskFormer, and related backbones) on corn ears, sorghum panicles, and wheat spikes.

Concretely, the study addresses the following questions:

What is the performance of the models considering the different genotypes and collection dates on the different crops?
How do metrics change when models are trained on one collection date and evaluated on other dates (cross-date generalization)?
How do metrics change when models are trained on some genotypes and evaluated on held-out genotypes (cross-genotype generalization)?

3. Repository Structure

analysis/
- R Markdown workflows for aggregating, cleaning and analyzing model evaluation results.
data/
- images_metadata.csv: metadata for images (crop, genotype, collection date, etc.).
- corn/, sorghum/, wheat/: crop-specific image/mask datasets, each split into train/test and used by the Python training notebooks.
docs/
- LiteratureReview.md: summary of segmentation architectures used in related crop & plant studies.
- MetricsReview.md: summary of segmentation evaluation metrics.
models/
- Pretrained weights for each crop and model variant (e.g., SegFormer, UNet, SegNet, DeepLabV3+, MaskFormer) including per-date, per-genotype, reduced-data, and transfer-learning variants.
notebooks/
- Jupyter notebooks implementing data preparation, model training, and evaluation for different experimental settings (all data, per-date, per-genotype, reduced data, transfer learning).
results/
- CSVs and plots with quantitative results (e.g., model_evaluation_results.csv, summary_statistics_*.csv) generated by notebooks and R analyses.
utils/
- Python utility modules for models, and crop-specific helpers.

4. Data

4.1. Raw Images and Masks

The data/ folder is organized by crop:

data/corn/
data/sorghum/
data/wheat/

Each crop subfolder is expected to contain train/ and test/ subfolders with images/ and masks/, for example:

data/corn/train/images
data/corn/train/masks
data/corn/test/images
data/corn/test/masks

Images are RGB field images, and masks are single-channel (grayscale) segmentation masks where positive pixels correspond to the target organ (ear, panicle, spike) and background pixels are zero.

4.2. Metadata

The file data/images_metadata.csv holds metadata per image. Typical fields include:

crop (wheat, sorghum, corn)
genotype
collectiondate (or similar date/stage identifier)
individual ID
file names or paths

This metadata was used to analyze data balance for subsequent augmentation.

5. Models

The study compares a suite of segmentation architectures frequently used in plant and crop phenotyping:

UNet
SegNet
DeepLabV3+
SegFormer
MaskFormer

Literature context, original publications, and architectural descriptions are given in docs/LiteratureReview.md.

Trained model weights are stored in models/, (not available in the repository) with filenames that encode:

crop (wheat, sorghum, corn)
architecture (e.g., UNET, SegNet, DeepLabV3Plus, SegFormer, MaskFormer)
experimental condition (date-specific, genotype-specific, data-reduced, transfer-learning, etc.).

Examples:

models/wheat_U-NET_seg.pt
models/sorghum_SegFormer_20less_seg.pt
models/corn_SegFormer_date3_seg.pt

These models are loaded in the evaluation notebooks to generate predictions and metrics.

6. Experimental Design

The study is structured as a set of complementary experiments, each implemented as one or more notebooks in notebooks/ and corresponding R analysis scripts in analysis/.

6.1. Baseline: All Data

Notebook: notebooks/trainingModels_allData.ipynb
Data: all available images for each crop.
Goal: train each model using the full dataset for a given crop and quantify baseline performance (IoU, Precision, Recall, F1, etc.).

In this setting, the SegmentationDataset class loads images and masks from the appropriate train/ and test/ folders, normalizes images, and binarizes masks.

6.2. Collection Date Experiments

Notebook: notebooks/trainingModels_collectionDates.ipynb
Goal: train and evaluate models separately per collection date, and study how performance varies with stage and environmental conditions.

Corresponding trained weights are stored as per-date model files in models/ (e.g., *date1_seg.pt, *date2_seg.pt, ...).

6.3. Genotype Experiments

Notebook: notebooks/trainingModels_genotypes.ipynb
Goal: evaluate cross-genotype generalization by training on a subset of genotypes and testing on held-out genotypes.

Metadata in data/images_metadata.csv is used to define genotype-based splits, and evaluation results are later aggregated in analysis/dataAnalysis_crossGenotype.rmd.

6.4. Data-Reducing Experiments

Notebook: notebooks/trainingModels_dataReducing.ipynb
Goal: quantify how performance degrades as training data is progressively reduced (e.g., 50%, 40%, 30%, 20%, 10% of the full dataset), and evaluate the effect of data augmentation.

Model weights reflecting these regimes are saved with suffixes like 10less, 20less, 30less, 40less, and 50less.

6.5. Transfer Learning Across Dates

Notebook: notebooks/trainingModels_transferLearning.ipynb
Goal: investigate whether initializing from a model trained on an earlier collection date and fine-tuning on a later date improves performance versus training from scratch.

Models such as sorghum_SegFormer_transfer_seg.pt store the resulting transfer-learning weights.

6.6. Evaluation

Notebook: notebooks/modelsEvaluation.ipynb

This notebook:

Loads the trained weights from models/ for each crop and architecture.
Uses a dedicated evaluation dataset class (SegEvalDataset) to iterate over test images and masks.
Computes segmentation metrics per image using scikit-learn and custom utilities:
- IoU
- Precision
- Recall
- F1 score
Aggregates results into a unified CSV (results/model_evaluation_results.csv), including columns for:
- crop
- genotype
- collection date
- model name
- metrics (IoU, Precision, Recall, F1)

These results serve as the input for the downstream R-based statistical analyses.

7. Metrics

The main metrics used in this study are:

Intersection over Union (IoU)
Precision
Recall
F1 Score

Additional metrics such as Pixel Accuracy, Dice Coefficient, MAE, and Hausdorff Distance are described in docs/MetricsReview.md, and may be used for robustness checks or specific analyses.

8. Statistical Analysis

Statistical and exploratory analyses are implemented in R Markdown files in analysis/.

8.1. Overall Performance (All Experiments)

File: analysis/dataAnalysis_all.Rmd

Workflow:

Load results/model_evaluation_results.csv.
Define the subsets of interest:
- crops: wheat, sorghum, corn
- models: DeepLabV3Plus, U-NET, SegNet, SegFormer, MaskFormer
- metrics: IoU, Precision, Recall, F1
Remove outliers per (crop, model, metric) using an IQR-based rule.
Summarize performance per crop and model:
- Mean, max, min, variance, and coefficient of variation (CV) for each metric.
Export summary tables to results/:
- summary_statistics_by_crop_model_all.csv
- summary_stats_short_by_crop_model.csv

These outputs provide a compact view of how each model performs across crops.

8.2. Cross-Genotype

File: analysis/dataAnalysis_crossGenotype.rmd

Focus:

Compare model performance when genotypes are held out at evaluation time.
Evaluate stability of IoU, Precision, Recall, and F1 across genotypes.

8.3. Cross-Date

File: analysis/dataAnalysis_crossDate.Rmd

Focus:

Evaluate models trained on one collection date and tested on other dates.
Quantify degradation or improvement across phenological stages.

8.4. Data Reduction

File: analysis/dataAnalysis_dataReducing.Rmd

Focus:

Relate training set size (e.g., 100%, 50%, 40%, 30%, 20%, 10%) to performance.
Quantify how metrics drop as data is removed and how augmentation mitigates this drop.

8.5. Transfer Learning

File: analysis/dataAnalysis_transfer.Rmd

Focus:

Compare performance of models trained from scratch versus transfer learning across dates.
Evaluate whether fine-tuning from an earlier date improves metrics on later dates.

9. Reproducibility

9.1. Environment

Python:

PyTorch and torchvision for model implementation and training.
numpy, pandas, scikit-image, scikit-learn for preprocessing and metrics.
tqdm for progress bars.

R:

tidyverse for data wrangling.
patchwork for combining plots.

9.2. Typical Workflow

To reproduce the study or adapt it to new data:

Prepare Data
- Organize your images and masks under data/{crop}/train and data/{crop}/test, following the same folder structure.
- Update or generate data/images_metadata.csv with crop, genotype, collection date, and file names.
Train Models
- Run the appropriate training notebooks in notebooks/:
  - trainingModels_allData.ipynb
  - trainingModels_collectionDates.ipynb
  - trainingModels_genotypes.ipynb
  - trainingModels_dataReducing.ipynb
  - trainingModels_transferLearning.ipynb
- Inspect and, if needed, adjust hyperparameters (learning rate, batch size, epochs, image size, etc.) and the CROP setting in each notebook.
Evaluate Models
- Run notebooks/modelsEvaluation.ipynb to:
  - Load the best model checkpoints from models/.
  - Generate segmentation predictions on test data.
  - Compute metrics and save results/model_evaluation_results.csv.
Analyze Results in R
- Knit or run the R Markdown scripts in analysis/ (e.g., dataAnalysis_all.Rmd) to:
  - Filter and remove outliers.
  - Compute summary statistics and plots.
  - Export summary tables in results/.

10. Extending the Study

To extend this study to new crops, organs, or models:

Add new image/mask datasets under data/{new_crop}/ with the same structure.
Update data/images_metadata.csv with the new crop, genotypes, and dates.
Implement or import the new model architecture under utils/models/.
Create new training/evaluation notebooks reusing the existing Dataset and evaluation pipelines.
Add new analysis scripts in analysis/ or extend existing ones to incorporate the additional factors.

11. References

See docs/LiteratureReview.md for a curated list of segmentation architectures and plant science applications relevant to this study.
See docs/MetricsReview.md for definitions and interpretations of segmentation metrics used throughout the analyses.

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
analysis		analysis
data		data
docs		docs
notebooks		notebooks
results		results
utils		utils
.DS_Store		.DS_Store
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Segmentation Study Documentation

1. Overview

2. Objectives

3. Repository Structure

4. Data

4.1. Raw Images and Masks

4.2. Metadata

5. Models

6. Experimental Design

6.1. Baseline: All Data

6.2. Collection Date Experiments

6.3. Genotype Experiments

6.4. Data-Reducing Experiments

6.5. Transfer Learning Across Dates

6.6. Evaluation

7. Metrics

8. Statistical Analysis

8.1. Overall Performance (All Experiments)

8.2. Cross-Genotype

8.3. Cross-Date

8.4. Data Reduction

8.5. Transfer Learning

9. Reproducibility

9.1. Environment

9.2. Typical Workflow

10. Extending the Study

11. References

About

Uh oh!

Releases

Packages

Contributors 4

Uh oh!

Languages

License

Ciampitti-Lab/SegmentationStudy

Folders and files

Latest commit

History

Repository files navigation

Segmentation Study Documentation

1. Overview

2. Objectives

3. Repository Structure

4. Data

4.1. Raw Images and Masks

4.2. Metadata

5. Models

6. Experimental Design

6.1. Baseline: All Data

6.2. Collection Date Experiments

6.3. Genotype Experiments

6.4. Data-Reducing Experiments

6.5. Transfer Learning Across Dates

6.6. Evaluation

7. Metrics

8. Statistical Analysis

8.1. Overall Performance (All Experiments)

8.2. Cross-Genotype

8.3. Cross-Date

8.4. Data Reduction

8.5. Transfer Learning

9. Reproducibility

9.1. Environment

9.2. Typical Workflow

10. Extending the Study

11. References

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Uh oh!

Languages

Packages