Skip to content

Add data preprocessing utilities for image normalization and augmentation#211

Open
kamalahasiniburra wants to merge 1 commit intoML4SCI:mainfrom
kamalahasiniburra:add-data-preprocessing-utils
Open

Add data preprocessing utilities for image normalization and augmentation#211
kamalahasiniburra wants to merge 1 commit intoML4SCI:mainfrom
kamalahasiniburra:add-data-preprocessing-utils

Conversation

@kamalahasiniburra
Copy link
Copy Markdown

Summary

This PR adds a reusable data preprocessing utilities module for the DeepLense project with comprehensive tests.

What is included

preprocessing_utils.py

  • normalize_images(): Image normalization with minmax, zscore, and robust methods
    • augment_image(): Augmentation transforms including flip, rotate, and Gaussian noise
    • stratified_split(): Stratified train/val/test splitting to maintain class balance
    • compute_class_weights(): Class weight computation for imbalanced datasets
    • create_image_patches(): Extract overlapping patches for patch-based training

test_preprocessing_utils.py

  • 12 comprehensive tests covering all preprocessing functions
    • Tests for normalization correctness, augmentation behavior, split validity, and edge cases

Why this is useful

Data preprocessing is fundamental to every DeepLense sub-project but is often re-implemented. This shared module standardizes preprocessing across sub-projects, ensuring consistent data handling and reproducibility. All 12 tests pass locally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant