Conversation
Replaced luminance_loss with brightness_loss and added new brightness_loss function.
Added device handling for brightness loss calculation.
Updated brightness_loss function to maintain dimensions.
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Tip Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs). Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request establishes the foundational structure for a machine learning project focused on stain normalization of histopathological images. It encompasses a complete development lifecycle, from environment setup and code quality enforcement to data preprocessing, model implementation, training, and comprehensive result analysis. The changes enable robust experimentation and deployment of stain normalization techniques, providing tools for both tile-level and whole-slide image processing. Highlights
Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a comprehensive new project for histopathological image stain normalization. It includes the full pipeline from data loading and preprocessing, a U-Net based model, a PyTorch Lightning training setup, and extensive analysis and visualization callbacks. The overall structure is well-organized and leverages modern tools like Hydra, PDM, and Ruff.
My review has identified several issues. Most critically, there are bugs in the data augmentation pipeline where transforms expect float images but receive uint8, which could lead to incorrect behavior or crashes. I've also found some high-severity issues in preprocessing scripts, such as a redefined variable and leftover debugging code. Additionally, there are several medium-severity suggestions to improve maintainability, correctness, and adherence to best practices, including adding missing type hints, removing unreachable code, and fixing typos. Addressing these points will significantly improve the robustness and quality of the new project.
| Returns: | ||
| Modified RGB image as a float32 NumPy array with values in [0.0, 1.0]. | ||
| """ | ||
| hed_image = separate_stains(img, hed_from_rgb) |
There was a problem hiding this comment.
The skimage.color.separate_stains function expects the input image to be of type float with values in the range [0, 1]. However, the input img from the dataset is a uint8 array. This will lead to incorrect results or a crash. You should convert the image to float and normalize it at the beginning of the apply method.
| hed_image = separate_stains(img, hed_from_rgb) | |
| img_float = img.astype(np.float32) / 255.0 | |
| hed_image = separate_stains(img_float, hed_from_rgb) |
| saturation_scale = np.random.uniform(*self.saturation_range) | ||
| value_scale = np.random.uniform(*self.value_range) | ||
|
|
||
| hsv_image = rgb2hsv(img) |
There was a problem hiding this comment.
The skimage.color.rgb2hsv function expects the input image to be of type float. The img passed here is uint8, which will lead to incorrect color space conversion. You need to convert the image to float and normalize it to the [0, 1] range first.
| hsv_image = rgb2hsv(img) | |
| img_float = img.astype(np.float32) / 255.0 | |
| hsv_image = rgb2hsv(img_float) |
| h_factor = np.random.uniform(*self.h_range) | ||
| e_factor = np.random.uniform(*self.e_range) | ||
|
|
||
| hed_image = separate_stains(img, hed_from_rgb) |
There was a problem hiding this comment.
The skimage.color.separate_stains function expects a float image in the range [0, 1], but it receives a uint8 image from the dataset. This is a critical bug that will cause incorrect behavior. Please convert the input image to float before processing.
| hed_image = separate_stains(img, hed_from_rgb) | |
| img_float = img.astype(np.float32) / 255.0 | |
| hed_image = separate_stains(img_float, hed_from_rgb) |
| #main() | ||
| slides = [("/home/jovyan/staining/demo_data/P-2016_0077-08-1_hed_h0.6_e1.5.tiff")] | ||
| train_slides_df, train_tiles_df = tiling(slides=slides, handler=handler) | ||
|
|
||
| mlflow.set_experiment(experiment_name="Stain-Normalization") | ||
| with mlflow.start_run(run_name="P-2016_0077-08-1_hed all tissue tiles") as _: | ||
| save_mlflow_dataset( | ||
| slides=train_slides_df, | ||
| tiles=train_tiles_df, | ||
| dataset_name="P-2016_0077-08-1_hed", | ||
| ) | ||
|
|
There was a problem hiding this comment.
This block of code seems to be leftover from debugging or testing. It's outside of any function and will execute upon module import. It also contains a hardcoded path and comments out the main() function call. This should be cleaned up or moved into a dedicated test script.
| #main() | |
| slides = [("/home/jovyan/staining/demo_data/P-2016_0077-08-1_hed_h0.6_e1.5.tiff")] | |
| train_slides_df, train_tiles_df = tiling(slides=slides, handler=handler) | |
| mlflow.set_experiment(experiment_name="Stain-Normalization") | |
| with mlflow.start_run(run_name="P-2016_0077-08-1_hed all tissue tiles") as _: | |
| save_mlflow_dataset( | |
| slides=train_slides_df, | |
| tiles=train_tiles_df, | |
| dataset_name="P-2016_0077-08-1_hed", | |
| ) | |
| if __name__ == "__main__": | |
| main() |
| from stain_normalization.data.modification.combiend_modification import ( | ||
| CombinedModifications, | ||
| ) |
There was a problem hiding this comment.
There is a typo in the filename combiend_modification. It should be combined_modification. Please rename the file and update the import statement accordingly.
| from stain_normalization.data.modification.combiend_modification import ( | |
| CombinedModifications, | |
| ) | |
| from stain_normalization.data.modification.combined_modification import ( | |
| CombinedModifications, | |
| ) |
| outputs: list[torch.Tensor], | ||
| batch: tuple[torch.Tensor, list], |
There was a problem hiding this comment.
The type hints for outputs and batch are incorrect or too generic.
outputsis a singletorch.Tensorreturned fromtest_step, not alist[torch.Tensor].- The
listin thebatchtuple is a list of dictionaries. A more specific type hint would improve readability.
| outputs: list[torch.Tensor], | |
| batch: tuple[torch.Tensor, list], | |
| outputs: torch.Tensor, | |
| batch: tuple[torch.Tensor, list[dict]], |
| if analyzer is None: | ||
| return |
There was a problem hiding this comment.
This check for analyzer is None appears to be unreachable. The preceding logic ensures that either run_reference_mode or run_paired_mode is called, both of which return a StainAnalyzer instance. If neither condition is met, parser.error() is called, which terminates the script. This block can be safely removed.
| return np.array(Image.open(path).convert("RGB")) | ||
|
|
||
|
|
||
| def iterate_tiles(slides, tiles): |
There was a problem hiding this comment.
For better code clarity and maintainability, please add type hints to the function arguments (slides, tiles) and the return value. Based on the usage, slides and tiles appear to be pandas DataFrames, and the function returns an iterator.
| def iterate_tiles(slides, tiles): | |
| def iterate_tiles(slides: pd.DataFrame, tiles: pd.DataFrame) -> Iterator[tuple[str, np.ndarray, str]]: |
| """ | ||
| # Calculate RGB from optical density by reversing the process in estimate_stain_vectors. | ||
| # default i0=240 (transmitted light intensity) | ||
| rgb = np.clip(240 * np.exp(-od_vector), 0, 255) / 255.0 |
There was a problem hiding this comment.
The value 240 is a magic number representing the transmitted light intensity (i0) used in optical density calculations. It's better to define this as a named constant at the module level (e.g., TRANSMITTED_LIGHT_INTENSITY = 240) to improve readability and make it easier to change if needed.
| rgb = np.clip(240 * np.exp(-od_vector), 0, 255) / 255.0 | |
| rgb = np.clip(TRANSMITTED_LIGHT_INTENSITY * np.exp(-od_vector), 0, 255) / 255.0 |
| from math import exp | ||
|
|
||
| class L1SSIMLoss(nn.Module): | ||
| def __init__(self, lambda_dssim: float = 0.6, lambda_l1: float = 0.2, lambda_lum: float = 0.2, lambda_gdl: float = 0.1): |
There was a problem hiding this comment.
The weights for the different loss components (lambda_dssim, lambda_l1, lambda_lum, lambda_gdl) sum to 1.1. While not strictly required, it's common practice for these weights to sum to 1.0, as it makes their relative contributions clearer. Consider normalizing the weights or clarifying in a comment why they don't sum to 1.
|
Make multiple smaller ones instead of one large PR. It is impossible to review |
|
@matejpekar Will it be enough if I separate it like this:
Would that be enough modularization, or did you imagine some other way? Also, should I do it by creating new branches and opening separate merge requests to main, or chain the requests together (branch 1 to main, branch 2 to branch 1, and so on)? |
No description provided.