feat: preprocessing report by Adames4 · Pull Request #7 · RationAI/ulcerative-colitis

Adames4 · 2026-02-12T20:47:05Z

Tile masks script.

Blocked by #6

[Files inside reporting/ dont have to be checked, except reporting/reporter/preprocessing/]

Dependency graph:

                         +--------------+
                  -------| tissue-masks |<------+           +------------+      +----------------------+
                 /       +--------------+       |       +---| tile-masks |<-----| preprocessing-report |
                /                               |       |   +------------+      +----------------------+
+---------+    /                            +--------+  |
| dataset | <-+                             | tiling |<-+
+---------+    \                            +--------+  |
                \                               |       |   +------------+
                 \       +-----------------+    |       +---| embeddings |
                  -------| quality-control |<---+           +------------+
                         +-----------------+

…eport

coderabbitai · 2026-02-12T20:47:13Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feature/preprocessing-report

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist · 2026-02-12T20:47:35Z

Summary of Changes

Hello @Adames4, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request implements a new preprocessing report feature to enhance the visibility and quality assurance of data processing. It involves adding a dedicated reporting library, setting up its configuration across various environments and data sources, and defining how masks and metrics are retrieved and saved. The changes aim to provide structured and detailed insights into the preprocessing pipeline for different datasets.

Highlights

New Preprocessing Report Feature: Introduced functionality to generate detailed preprocessing reports, enabling better quality control and analysis of data processing steps.
New report Dependency Integration: Integrated a new external report tool as a dependency, along with its associated packages like orjson, polars, and types-pyyaml, to power the reporting capabilities.
Comprehensive Reporting Configuration: Added extensive YAML configurations for the new reporting system, covering default settings, Hydra integration, MLflow tracking, various mask retrieval methods, and report saving options.
Dataset-Specific Preprocessing Reports: Included specific preprocessing report configurations for different datasets (ftn, ikem, knl_patos), demonstrating tailored reporting for diverse data sources.

Changelog

.gitignore
- Added '/outputs' directory to the ignore list.
pyproject.toml
- Added 'report' to the 'dependency-groups' section.
- Included the 'report' Git repository as a new source under 'tool.uv.sources'.
reporting/default.yaml
- Added a new default configuration file for the reporting system, including settings for Hydra, MLflow, user, reporter, and metadata.
reporting/hydra/default.yaml
- Added Hydra-specific default configuration for job naming and application name within the reporting context.
reporting/mlflow/base.yaml
- Added a base MLflow configuration defining a generic tracking URI.
reporting/mlflow/kubas_cluster.yaml
- Added MLflow configuration for a Kubas cluster tracking URI.
reporting/mlflow/kubas_external.yaml
- Added MLflow configuration for an external Kubas tracking URI.
reporting/mlflow/local.yaml
- Added MLflow configuration for local file-based tracking.
reporting/reporter/background/drive.yaml
- Added configuration for a DriveMasksRetriever, specifying directory and layer names.
reporting/reporter/background/mlflow_artifact.yaml
- Added configuration for an ArtifactURIMasksRetriever, specifying artifact URI and layer name.
reporting/reporter/background/mlflow_run_id.yaml
- Added configuration for a RunIDMlflowMaskRetriever, specifying directory, run ID, and layer names.
reporting/reporter/background/mlflow_tyler.yaml
- Added configuration for a TylerMLFlowMaskRetriever, including artifact URIs, item name column, background masks directory, ground truth column, and layer name.
reporting/reporter/preprocessing/ftn.yaml
- Added a specific preprocessing report configuration for the 'ftn' dataset, detailing background, mask retrievers with shader configurations, and MLflow metadata.
reporting/reporter/preprocessing/ikem.yaml
- Added a specific preprocessing report configuration for the 'ikem' dataset, detailing background, mask retrievers with shader configurations, and MLflow metadata.
reporting/reporter/preprocessing/knl_patos.yaml
- Added a specific preprocessing report configuration for the 'knl_patos' dataset, detailing background, mask retrievers with shader configurations, and MLflow metadata, including concrete run IDs.
reporting/reporter/save/local.yaml
- Added configuration for saving reports locally to an HTML file.
reporting/reporter/save/mlflow_attach_to_run.yaml
- Added configuration for attaching reports to an existing MLflow run.
reporting/reporter/save/mlflow_standalone_run.yaml
- Added configuration for saving reports as a standalone MLflow run artifact.
uv.lock
- Added 'orjson' package with its version and distribution details.
- Added 'polars' package with its version and distribution details, including its dependency 'polars-runtime-32'.
- Added 'polars-runtime-32' package with its version and distribution details.
- Added 'report' package with its version, Git source, and dependencies.
- Added 'types-pyyaml' package with its version and distribution details.
- Updated the 'dev' dependency group to include 'report'.
- Updated the 'job' dependency group to include 'report' with its Git source.

Activity

The pull request introduces a new feature for preprocessing reports, as indicated by the 'feat' prefix in the title.
This PR addresses and closes issue IBD-22, which is related to 'Preprocessing-report-MR'.
The implementation is noted as being blocked by pull request feat: tile masks #6, suggesting a dependency on other ongoing work.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a new reporting feature, adding configuration files for generating preprocessing reports. The changes include new dependencies for reporting and a set of YAML files to configure the report generation for different datasets (ftn, ikem, knl_patos). The configuration files are well-structured, but there are opportunities to reduce duplication and improve maintainability by using YAML features like anchors and aliases. I've also pointed out some minor formatting issues.

reporting/reporter/preprocessing/ftn.yaml

reporting/reporter/preprocessing/ikem.yaml

reporting/reporter/preprocessing/knl_patos.yaml

Copilot

Pull request overview

This pull request adds preprocessing report functionality to the project, building on the tile masks feature from PR #6. It introduces a new dependency on the report library and creates Hydra configuration files for generating preprocessing reports with MLflow integration.

Changes:

Added report library dependency with supporting packages (orjson, polars, types-pyyaml)
Created three preprocessing report configurations for different datasets (KNL Patos, IKEM, FTN)
Established Hydra configuration structure for report generation, saving, and MLflow tracking

Reviewed changes

Copilot reviewed 17 out of 19 changed files in this pull request and generated 23 comments.

Show a summary per file

File	Description
pyproject.toml	Added 'report' dependency group and Git source configuration
uv.lock	Locked dependencies for report, orjson, polars, polars-runtime-32, and types-pyyaml
.gitignore	Added /outputs directory to ignore list
reporting/default.yaml	Base configuration template with metadata and reporter structure
reporting/hydra/default.yaml	Hydra job configuration for report generation
reporting/mlflow/*.yaml	MLflow tracking URI configurations for different environments
reporting/reporter/save/*.yaml	Report saving configurations for MLflow and local storage
reporting/reporter/background/*.yaml	Background mask retriever configuration templates
reporting/reporter/preprocessing/knl_patos.yaml	Complete preprocessing report config for KNL Patos dataset
reporting/reporter/preprocessing/ikem.yaml	Preprocessing report config for IKEM dataset (with TODO placeholders)
reporting/reporter/preprocessing/ftn.yaml	Preprocessing report config for FTN dataset (with TODO placeholders)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

reporting/reporter/preprocessing/ftn.yaml

Copilot · 2026-02-12T20:53:53Z

reporting/reporter/preprocessing/ikem.yaml

+
+  - _target_: report.masks.RunIDMlflowMaskRetriever
+    run_id: TODO
+    dir_name: artifacts_tiles


Inconsistent directory path compared to knl_patos.yaml which uses "tile_masks/artifacts_tiles" for the same layer. This inconsistency may cause the report to fail if the actual artifact structure matches knl_patos.yaml.

Suggested change

dir_name: artifacts_tiles

dir_name: tile_masks/artifacts_tiles

Copilot · 2026-02-12T20:53:53Z

reporting/reporter/preprocessing/ftn.yaml

+
+  - _target_: report.masks.RunIDMlflowMaskRetriever
+    run_id: TODO
+    dir_name: clean_tiles


Inconsistent directory path compared to knl_patos.yaml which uses "tile_masks/clean_tiles" for the same layer. This inconsistency may cause the report to fail if the actual artifact structure matches knl_patos.yaml.

Suggested change

dir_name: clean_tiles

dir_name: tile_masks/clean_tiles

Copilot · 2026-02-12T20:53:54Z

reporting/reporter/preprocessing/ftn.yaml

+  - background: drive
+  - save: mlflow_standalone_run
+
+title: Ulcerative Colitis Preprocessing - ftn


Inconsistent capitalization in title. "ftn" should be capitalized (e.g., "FTN") to match the pattern used in other configuration files where dataset sources are uppercase (e.g., "IKEM", "KNL Patos").

Copilot · 2026-02-12T20:53:57Z

reporting/reporter/preprocessing/ftn.yaml

+
+  - _target_: report.masks.RunIDMlflowMaskRetriever
+    run_id: TODO
+    dir_name: artifacts_per_pixel 


Trailing whitespace detected at the end of the line. This should be removed for code cleanliness.

Suggested change

dir_name: artifacts_per_pixel

dir_name: artifacts_per_pixel

Copilot · 2026-02-12T20:53:57Z

reporting/reporter/preprocessing/ftn.yaml

+
+mask_retrievers:
+  - _target_: report.masks.RunIDMlflowMaskRetriever
+    run_id: TODO


The run_id is set to "TODO" which is a placeholder value. This configuration will fail at runtime if used without a valid MLflow run ID.

Copilot · 2026-02-12T20:53:57Z

reporting/reporter/preprocessing/ikem.yaml

+      visible: False
+
+  - _target_: report.masks.RunIDMlflowMaskRetriever
+    run_id: TODO


The run_id is set to "TODO" which is a placeholder value. This configuration will fail at runtime if used without a valid MLflow run ID.

Copilot · 2026-02-12T20:53:57Z

reporting/reporter/preprocessing/ikem.yaml

+      visible: False
+
+  - _target_: report.masks.RunIDMlflowMaskRetriever
+    run_id: TODO


The run_id is set to "TODO" which is a placeholder value. This configuration will fail at runtime if used without a valid MLflow run ID.

reporting/reporter/preprocessing/ftn.yaml

Adames4 added 6 commits February 5, 2026 09:28

chore: add reporting as dependency

9e8191c

feat: download reporting confs

d098d17

feat: create preprocessing confs

312ce62

feat: reporting conf

804c862

chore: Merge branch 'feature/tile-masks' into feature/preprocessing-r…

c1ee243

…eport

chore: pyproject

24e6e7c

Adames4 requested a review from Copilot February 12, 2026 20:47

Copilot started reviewing on behalf of Adames4 February 12, 2026 20:47 View session

gemini-code-assist bot reviewed Feb 12, 2026

View reviewed changes

Copilot AI reviewed Feb 12, 2026

View reviewed changes

Adames4 requested a review from vejtek February 12, 2026 21:44

Adames4 removed the request for review from vejtek February 28, 2026 20:51

	dir_name: artifacts_tiles
	dir_name: tile_masks/artifacts_tiles

Conversation

Adames4 commented Feb 12, 2026

Uh oh!

coderabbitai bot commented Feb 12, 2026

Review skipped

Uh oh!

gemini-code-assist bot commented Feb 12, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants