Conversation
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello @Adames4, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request implements a new preprocessing report feature to enhance the visibility and quality assurance of data processing. It involves adding a dedicated reporting library, setting up its configuration across various environments and data sources, and defining how masks and metrics are retrieved and saved. The changes aim to provide structured and detailed insights into the preprocessing pipeline for different datasets. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a new reporting feature, adding configuration files for generating preprocessing reports. The changes include new dependencies for reporting and a set of YAML files to configure the report generation for different datasets (ftn, ikem, knl_patos). The configuration files are well-structured, but there are opportunities to reduce duplication and improve maintainability by using YAML features like anchors and aliases. I've also pointed out some minor formatting issues.
There was a problem hiding this comment.
Pull request overview
This pull request adds preprocessing report functionality to the project, building on the tile masks feature from PR #6. It introduces a new dependency on the report library and creates Hydra configuration files for generating preprocessing reports with MLflow integration.
Changes:
- Added
reportlibrary dependency with supporting packages (orjson, polars, types-pyyaml) - Created three preprocessing report configurations for different datasets (KNL Patos, IKEM, FTN)
- Established Hydra configuration structure for report generation, saving, and MLflow tracking
Reviewed changes
Copilot reviewed 17 out of 19 changed files in this pull request and generated 23 comments.
Show a summary per file
| File | Description |
|---|---|
| pyproject.toml | Added 'report' dependency group and Git source configuration |
| uv.lock | Locked dependencies for report, orjson, polars, polars-runtime-32, and types-pyyaml |
| .gitignore | Added /outputs directory to ignore list |
| reporting/default.yaml | Base configuration template with metadata and reporter structure |
| reporting/hydra/default.yaml | Hydra job configuration for report generation |
| reporting/mlflow/*.yaml | MLflow tracking URI configurations for different environments |
| reporting/reporter/save/*.yaml | Report saving configurations for MLflow and local storage |
| reporting/reporter/background/*.yaml | Background mask retriever configuration templates |
| reporting/reporter/preprocessing/knl_patos.yaml | Complete preprocessing report config for KNL Patos dataset |
| reporting/reporter/preprocessing/ikem.yaml | Preprocessing report config for IKEM dataset (with TODO placeholders) |
| reporting/reporter/preprocessing/ftn.yaml | Preprocessing report config for FTN dataset (with TODO placeholders) |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| - _target_: report.masks.RunIDMlflowMaskRetriever | ||
| run_id: TODO | ||
| dir_name: artifacts_tiles |
There was a problem hiding this comment.
Inconsistent directory path compared to knl_patos.yaml which uses "tile_masks/artifacts_tiles" for the same layer. This inconsistency may cause the report to fail if the actual artifact structure matches knl_patos.yaml.
| dir_name: artifacts_tiles | |
| dir_name: tile_masks/artifacts_tiles |
|
|
||
| - _target_: report.masks.RunIDMlflowMaskRetriever | ||
| run_id: TODO | ||
| dir_name: clean_tiles |
There was a problem hiding this comment.
Inconsistent directory path compared to knl_patos.yaml which uses "tile_masks/clean_tiles" for the same layer. This inconsistency may cause the report to fail if the actual artifact structure matches knl_patos.yaml.
| dir_name: clean_tiles | |
| dir_name: tile_masks/clean_tiles |
| - background: drive | ||
| - save: mlflow_standalone_run | ||
|
|
||
| title: Ulcerative Colitis Preprocessing - ftn |
There was a problem hiding this comment.
Inconsistent capitalization in title. "ftn" should be capitalized (e.g., "FTN") to match the pattern used in other configuration files where dataset sources are uppercase (e.g., "IKEM", "KNL Patos").
|
|
||
| - _target_: report.masks.RunIDMlflowMaskRetriever | ||
| run_id: TODO | ||
| dir_name: artifacts_per_pixel |
There was a problem hiding this comment.
Trailing whitespace detected at the end of the line. This should be removed for code cleanliness.
| dir_name: artifacts_per_pixel | |
| dir_name: artifacts_per_pixel |
|
|
||
| mask_retrievers: | ||
| - _target_: report.masks.RunIDMlflowMaskRetriever | ||
| run_id: TODO |
There was a problem hiding this comment.
The run_id is set to "TODO" which is a placeholder value. This configuration will fail at runtime if used without a valid MLflow run ID.
| visible: False | ||
|
|
||
| - _target_: report.masks.RunIDMlflowMaskRetriever | ||
| run_id: TODO |
There was a problem hiding this comment.
The run_id is set to "TODO" which is a placeholder value. This configuration will fail at runtime if used without a valid MLflow run ID.
| visible: False | ||
|
|
||
| - _target_: report.masks.RunIDMlflowMaskRetriever | ||
| run_id: TODO |
There was a problem hiding this comment.
The run_id is set to "TODO" which is a placeholder value. This configuration will fail at runtime if used without a valid MLflow run ID.
Tile masks script.
Closes IBD-22
Blocked by #6
[Files inside reporting/ dont have to be checked, except
reporting/reporter/preprocessing/]Dependency graph: