Skip to content

feat: tiling#5

Open
Adames4 wants to merge 47 commits intofeature/tissue-masksfrom
feature/tiling
Open

feat: tiling#5
Adames4 wants to merge 47 commits intofeature/tissue-masksfrom
feature/tiling

Conversation

@Adames4
Copy link
Collaborator

@Adames4 Adames4 commented Feb 12, 2026

Tiling script.

Closes IBD-19

Blocked by #3 , #4

[IGNORE QC FILES]

Dependency graph:

                         +--------------+
                  -------| tissue-masks |<------+           +------------+      +----------------------+
                 /       +--------------+       |       +---| tile-masks |<-----| preprocessing-report |
                /                               |       |   +------------+      +----------------------+
+---------+    /                            +--------+  |
| dataset | <-+                             | tiling |<-+
+---------+    \                            +--------+  |
                \                               |       |   +------------+
                 \       +-----------------+    |       +---| embeddings |
                  -------| quality-control |<---+           +------------+
                         +-----------------+

@Adames4 Adames4 self-assigned this Feb 12, 2026
@coderabbitai
Copy link

coderabbitai bot commented Feb 12, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/tiling

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link

Summary of Changes

Hello @Adames4, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a significant new capability for processing whole slide images by implementing a robust tiling pipeline. It includes new scripts for performing quality control checks on slides and subsequently extracting tiles based on specified parameters and masks. The changes are supported by new configuration files that define dataset-specific mask URIs and various tiling strategies, enabling flexible and reproducible tile generation for downstream analysis in computational pathology.

Highlights

  • New Tiling Feature: Implemented a comprehensive tiling pipeline for whole slide images, enabling extraction of smaller, manageable tiles for analysis.
  • Quality Control Integration: Introduced a dedicated quality control processing script to assess slide quality and generate masks, which are then utilized during the tiling process.
  • Extensive Configuration Support: Added numerous configuration files to define dataset-specific mask URIs and various tiling parameters (e.g., MPP levels, tile extents, strides, data splits) for different institutions.
  • Dependency Updates: Updated project dependencies to include new libraries such as rationai-sdk, rationai-tiling, and ratiopath, essential for the new tiling and quality control functionalities.
Changelog
  • configs/dataset/processed_w_masks/ftn.yaml
    • Added configuration for FTN dataset, specifying tissue and QC mask URIs.
  • configs/dataset/processed_w_masks/ikem.yaml
    • Added configuration for IKEM dataset, specifying tissue and QC mask URIs.
  • configs/dataset/processed_w_masks/knl_patos.yaml
    • Added configuration for KNL_PATOS dataset, specifying tissue and QC mask URIs.
  • configs/experiment/preprocessing/tiling/ftn_0_320px.yaml
    • Added tiling configuration for FTN dataset at level 0 with 320px tiles.
  • configs/experiment/preprocessing/tiling/ftn_0_75um.yaml
    • Added tiling configuration for FTN dataset at level 0 with 75um tiles.
  • configs/experiment/preprocessing/tiling/ftn_1_224px.yaml
    • Added tiling configuration for FTN dataset at level 1 with 224px tiles.
  • configs/experiment/preprocessing/tiling/ftn_2_224px.yaml
    • Added tiling configuration for FTN dataset at level 2 with 224px tiles.
  • configs/experiment/preprocessing/tiling/ikem_0_320px.yaml
    • Added tiling configuration for IKEM dataset at level 0 with 320px tiles.
  • configs/experiment/preprocessing/tiling/ikem_0_75um.yaml
    • Added tiling configuration for IKEM dataset at level 0 with 75um tiles.
  • configs/experiment/preprocessing/tiling/ikem_1_224px.yaml
    • Added tiling configuration for IKEM dataset at level 1 with 224px tiles.
  • configs/experiment/preprocessing/tiling/ikem_2_224px.yaml
    • Added tiling configuration for IKEM dataset at level 2 with 224px tiles.
  • configs/experiment/preprocessing/tiling/knl_patos_0_320px.yaml
    • Added tiling configuration for KNL_PATOS dataset at level 0 with 320px tiles.
  • configs/experiment/preprocessing/tiling/knl_patos_0_75um.yaml
    • Added tiling configuration for KNL_PATOS dataset at level 0 with 75um tiles.
  • configs/experiment/preprocessing/tiling/knl_patos_1_224px.yaml
    • Added tiling configuration for KNL_PATOS dataset at level 1 with 224px tiles.
  • configs/experiment/preprocessing/tiling/knl_patos_2_224px.yaml
    • Added tiling configuration for KNL_PATOS dataset at level 2 with 224px tiles.
  • configs/preprocessing/quality_control.yaml
    • Added a new base configuration file for quality control parameters.
  • configs/preprocessing/tiling.yaml
    • Added a new base configuration file for tiling parameters.
  • preprocessing/quality_control.py
    • Added a new Python script for performing quality control checks on whole slide images, organizing generated masks, and merging QC metrics.
  • preprocessing/tiling.py
    • Added a new Python script for whole slide image tiling, including dataset splitting, mask path integration, region of interest (ROI) creation, and coverage extraction using Ray for parallel processing.
  • preprocessing/tissue_masks.py
    • Updated the main function to correctly use the dataset variable for retrieving slide paths.
  • pyproject.toml
    • Added rationai-sdk, rationai-tiling, and ratiopath to project dependencies.
    • Updated uv.sources to include git repositories for rationai-tiling and rationai-sdk.
  • scripts/preprocessing/quality_control.py
    • Added a new job submission script for running quality control processing.
  • scripts/preprocessing/tiling.py
    • Added a new job submission script for running the tiling process.
  • uv.lock
    • Updated the lock file to reflect new and updated package dependencies, including affine, albucore, albumentations, cligj, donfig, geopandas, google-crc32c, httpcore, httpx, imagecodecs, imageio, lazy-loader, lz4, numcodecs, opencv-python-headless, pyogrio, pyproj, rasterio, scikit-image, shapely, simsimd, stringzilla, tenacity, tifffile, and zarr.
Activity
  • This pull request addresses and closes issue IBD-19.
  • The implementation of this feature was dependent on pull requests feat: tissue masks #3 and feat: quality control #4.
  • No human comments or reviews have been recorded on this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces new scripts and configurations for tiling and quality control. The overall structure is good, leveraging ray for parallel processing and hydra for configuration management. However, there are several areas for improvement. Multiple configuration and script files contain placeholders (TODOs and ...) that render them non-functional and need to be addressed. Additionally, the tiling script contains some maintainability issues, such as a non-descriptive function name and hardcoded values that should be refactored into constants for better clarity and robustness. I've provided specific comments and suggestions to address these points.

Comment on lines +5 to +6
tissue_mask_uri: mlflow-artifacts:/86/04778b10de254572b69ce0a101c1eee4/artifacts/tissue_masks # TODO update URI
qc_mask_uri: mlflow-artifacts:/86/c8edfb2541e84b44b1a28be3540c1a35/artifacts # TODO update URI No newline at end of file

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The artifact URIs are hardcoded and marked with a TODO. This should be updated with the final URIs before merging. For better maintainability, consider if these could be passed in via a more dynamic configuration method rather than being hardcoded in multiple files.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Waiting for #3 , #4

Comment on lines +5 to +6
tissue_mask_uri: mlflow-artifacts:/86/13359cdd5d1a47ddabc352b9aa0d7635/artifacts/tissue_masks # TODO update URI
qc_mask_uri: mlflow-artifacts:/86/98443fe2b67445d5a56598bff15b7f27/artifacts # TODO update URI No newline at end of file

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to other dataset configurations, the artifact URIs here are hardcoded and marked with a TODO. Please update them with the final URIs. Centralizing this configuration could prevent having to update multiple files.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Waiting for #3 , #4

Comment on lines +5 to +6
tissue_mask_uri: mlflow-artifacts:/86/8ef6d6f0c9af4f35a087596960f675aa/artifacts/tissue_masks # TODO update URI
qc_mask_uri: mlflow-artifacts:/86/75fc3e53112f4634ae5238777d87e88c/artifacts # TODO update URI No newline at end of file

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The artifact URIs are hardcoded with a TODO comment. Please ensure these are updated to the correct, final URIs. Having these values hardcoded in multiple files can be error-prone; a centralized configuration would be more robust.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Waiting for #3 , #4

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a preprocessing tiling stage (plus QC masks generation) to the ulcerative-colitis pipeline, wiring it into the existing Hydra/MLflow/Ray-based preprocessing structure and updating project dependencies accordingly.

Changes:

  • Introduces preprocessing/tiling.py to build slide/tile datasets using tissue + QC masks and save results to MLflow.
  • Adds preprocessing/quality_control.py to generate QC masks/metrics via rationai.AsyncClient, organize outputs, and log artifacts to MLflow.
  • Extends configs (preprocessing + experiments + datasets-with-masks) and updates dependencies (pyproject.toml, uv.lock) for tiling/QC tooling.

Reviewed changes

Copilot reviewed 23 out of 25 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
uv.lock Locks new dependencies required by tiling/QC (incl. rationai-tiling, rationai-sdk, ratiopath, and geo/image stack).
pyproject.toml Adds tiling/QC-related runtime deps and Git sources for rationai-tiling + rationai-sdk.
preprocessing/tissue_masks.py Small variable rename (dfdataset) for clarity.
preprocessing/tiling.py New tiling pipeline: dataset split, slide reading via Ray, mask overlap computation, MLflow dataset export.
preprocessing/quality_control.py New QC pipeline: async slide checks, mask organization, metrics aggregation, MLflow artifact logging.
scripts/preprocessing/tiling.py New kube-jobs submission script for tiling.
scripts/preprocessing/quality_control.py New kube-jobs submission script for QC.
configs/preprocessing/tiling.yaml Adds global tiling config schema (mpp/tile_extent/stride/splits/metadata).
configs/preprocessing/quality_control.yaml Adds global QC config schema (output_dir, timeouts, qc_parameters, metadata).
configs/experiment/preprocessing/tiling/ftn_*.yaml Adds FTN tiling experiment presets for multiple mpp/tile sizes.
configs/experiment/preprocessing/tiling/ikem_*.yaml Adds IKEM tiling experiment presets for multiple mpp/tile sizes.
configs/experiment/preprocessing/tiling/knl_patos_*.yaml Adds KNL Patos tiling experiment presets for multiple mpp/tile sizes.
configs/dataset/processed_w_masks/ftn.yaml Adds dataset variant with tissue/QC mask MLflow URIs (currently marked TODO).
configs/dataset/processed_w_masks/ikem.yaml Adds dataset variant with tissue/QC mask MLflow URIs (currently marked TODO).
configs/dataset/processed_w_masks/knl_patos.yaml Adds dataset variant with tissue/QC mask MLflow URIs (currently marked TODO).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +93 to +97
def nancy(row: dict[str, Any], df: pd.DataFrame) -> dict[str, Any]:
row["nancy_index"] = df.loc[Path(row["path"]).stem, "nancy"]
return row


Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function name nancy() is not self-describing (it mutates the row to add nancy_index). Consider renaming to something like add_nancy_index/attach_nancy_label to make the Ray pipeline easier to follow and grep.

Suggested change
def nancy(row: dict[str, Any], df: pd.DataFrame) -> dict[str, Any]:
row["nancy_index"] = df.loc[Path(row["path"]).stem, "nancy"]
return row
def add_nancy_index(row: dict[str, Any], nancy_index_df: pd.DataFrame) -> dict[str, Any]:
"""
Attach the 'nancy_index' value from `nancy_index_df` to the given `row`.
The row is expected to contain a 'path' key whose stem matches an index
entry in `nancy_index_df`, which must have a 'nancy' column.
"""
row["nancy_index"] = nancy_index_df.loc[Path(row["path"]).stem, "nancy"]
return row
def nancy(row: dict[str, Any], df: pd.DataFrame) -> dict[str, Any]:
"""Backward-compatible wrapper; prefer :func:`add_nancy_index`."""
return add_nancy_index(row, df)

Copilot uses AI. Check for mistakes.
Comment on lines +11 to +15
"git clone https://gitlab.ics.muni.cz/rationai/digital-pathology/pathology/ulcerative-colitis.git workdir",
"cd workdir",
"uv sync --frozen",
"uv run -m preprocessing.quality_control +dataset=processed/...",
],
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This job script clones ulcerative-colitis from an internal GitLab URL, while the existing scripts in scripts/preprocessing/ clone from GitHub. If that’s unintentional, align the clone URL across scripts (or parameterize it) so job submission is reproducible for all users/environments.

Copilot uses AI. Check for mistakes.
Comment on lines +5 to +6
tissue_mask_uri: mlflow-artifacts:/86/8ef6d6f0c9af4f35a087596960f675aa/artifacts/tissue_masks # TODO update URI
qc_mask_uri: mlflow-artifacts:/86/75fc3e53112f4634ae5238777d87e88c/artifacts # TODO update URI No newline at end of file
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This dataset config hard-codes MLflow artifact URIs and leaves a # TODO update URI note. If these URIs are not final/stable, the tiling pipeline will fail or use incorrect artifacts. Please either update them to the final run URIs, or switch to a placeholder/override-based approach so the committed default config is usable.

Suggested change
tissue_mask_uri: mlflow-artifacts:/86/8ef6d6f0c9af4f35a087596960f675aa/artifacts/tissue_masks # TODO update URI
qc_mask_uri: mlflow-artifacts:/86/75fc3e53112f4634ae5238777d87e88c/artifacts # TODO update URI
tissue_mask_uri: ${oc.env:TISSUE_MASK_URI,}
qc_mask_uri: ${oc.env:QC_MASK_URI,}

Copilot uses AI. Check for mistakes.
Comment on lines +5 to +6
tissue_mask_uri: mlflow-artifacts:/86/04778b10de254572b69ce0a101c1eee4/artifacts/tissue_masks # TODO update URI
qc_mask_uri: mlflow-artifacts:/86/c8edfb2541e84b44b1a28be3540c1a35/artifacts # TODO update URI No newline at end of file
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This dataset config hard-codes MLflow artifact URIs and leaves a # TODO update URI note. If these URIs are not final/stable, the tiling pipeline will fail or use incorrect artifacts. Please either update them to the final run URIs, or switch to a placeholder/override-based approach so the committed default config is usable.

Suggested change
tissue_mask_uri: mlflow-artifacts:/86/04778b10de254572b69ce0a101c1eee4/artifacts/tissue_masks # TODO update URI
qc_mask_uri: mlflow-artifacts:/86/c8edfb2541e84b44b1a28be3540c1a35/artifacts # TODO update URI
tissue_mask_uri: OVERRIDE_ME_TISSUE_MASK_URI
qc_mask_uri: OVERRIDE_ME_QC_MASK_URI

Copilot uses AI. Check for mistakes.
import ray
from omegaconf import DictConfig
from rationai.mlkit import with_cli_args
from rationai.mlkit.autolog import autolog
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import style for autolog is inconsistent with the other preprocessing modules (e.g., preprocessing/tissue_masks.py and preprocessing/create_dataset.py import autolog from rationai.mlkit). To keep imports consistent and avoid depending on internal module paths, consider importing autolog from the same public entrypoint here as well.

Suggested change
from rationai.mlkit.autolog import autolog
from rationai.mlkit import autolog

Copilot uses AI. Check for mistakes.
Comment on lines +63 to +65
assert isclose(
splits["train"] + splits["test_preliminary"] + splits["test_final"], 1.0
), "Splits must sum to 1.0"
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

split_dataset() uses an assert to validate that split fractions sum to 1.0. Assertions can be stripped with python -O, which would skip this validation and allow bad configs to proceed; prefer raising a ValueError (or hydra/omegaconf validation) with a clear message instead.

Suggested change
assert isclose(
splits["train"] + splits["test_preliminary"] + splits["test_final"], 1.0
), "Splits must sum to 1.0"
total_split = (
splits["train"] + splits["test_preliminary"] + splits["test_final"]
)
if not isclose(total_split, 1.0):
raise ValueError(
f"Splits must sum to 1.0, but got {total_split!r} "
f"(train={splits['train']!r}, "
f"test_preliminary={splits['test_preliminary']!r}, "
f"test_final={splits['test_final']!r})"
)

Copilot uses AI. Check for mistakes.
Comment on lines +82 to +86
preliminary_size = splits["test_preliminary"] / (1.0 - splits["train"])
test_preliminary, test_final = train_test_split_groups(
test,
train_size=preliminary_size,
groups=test["case_id"],
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

preliminary_size = splits["test_preliminary"] / (1.0 - splits["train"]) will raise ZeroDivisionError when train == 1.0 and test_preliminary > 0.0 (even if splits still sum to 1). Add explicit validation for split ranges/relationships (e.g., require train < 1 when preliminary/final splits are non-zero) or handle the train==1.0 case by returning empty test splits.

Copilot uses AI. Check for mistakes.
memory="128Gi",
shm="48Gi",
script=[
"git clone https://gitlab.ics.muni.cz/rationai/digital-pathology/pathology/ulcerative-colitis.git workdir",
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This job script clones ulcerative-colitis from an internal GitLab URL, while the other preprocessing job scripts in this repo clone from https://github.com/RationAI/ulcerative-colitis.git. Unless the repo has moved, this inconsistency will make the tiling job fail for users without GitLab access; align the clone URL/host with the rest of the scripts (or parameterize it).

Suggested change
"git clone https://gitlab.ics.muni.cz/rationai/digital-pathology/pathology/ulcerative-colitis.git workdir",
"git clone https://github.com/RationAI/ulcerative-colitis.git workdir",

Copilot uses AI. Check for mistakes.
@Adames4 Adames4 requested a review from vejtek February 12, 2026 20:23
This was referenced Feb 12, 2026
@Adames4 Adames4 removed the request for review from vejtek February 28, 2026 20:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants