feat: split by Adames4 · Pull Request #9 · RationAI/ulcerative-colitis

Adames4 · 2026-02-28T20:35:23Z

Summary by CodeRabbit

New Features
- Configurable dataset splitting with train/preliminary/final partitions and automatic fold assignment for cross-validation.
- Kubernetes-backed job to run the preprocessing workflow.
Documentation
- Added a project PR review styleguide.
Chores
- Added default configuration flags for review and memory behavior and new preprocessing configuration templates.
- Added a new runtime dependency.

coderabbitai · 2026-02-28T20:35:39Z

📝 Walkthrough

Walkthrough

Adds dataset-splitting functionality and related configs: new dataset split YAMLs, a Python module for stratified group splitting and fold assignment with MLflow artifact logging, a Kubernetes job submission script, a project PR review styleguide, and a Gemini tool config file.

Changes

Cohort / File(s)	Summary
Gemini config & styleguide `.gemini/config.yaml`, `.gemini/styleguide.md`	Add default Gemini configuration flags and a project PR review styleguide with ML-review guidance and domain-specific notes.
Global split config `configs/preprocessing/split_dataset.yaml`	Add top-level dataset-splitting config: splits mapping, n_folds, random_state, and metadata/hyperparams.
Per-dataset split configs `configs/experiment/preprocessing/split_dataset/ftn.yaml`, `configs/experiment/preprocessing/split_dataset/ikem.yaml`, `configs/experiment/preprocessing/split_dataset/knl_patos.yaml`	Add dataset-specific split ratio files (train/test_preliminary/test_final) and defaults.
Splitting implementation `preprocessing/split_dataset.py`	New module providing `split_dataset()` (validates ratios, stratified group train/test splits), `add_folds()` (StratifiedGroupKFold fold assignment), and `main()` (Hydra CLI, MLflow artifact download + logging).
Execution script & dependency `scripts/preprocessing/split_dataset.py`, `pyproject.toml`	Add Kubernetes job submit script to run the splitting workflow remotely; add `ratiopath>=1.1.2` dependency in pyproject.toml.

Sequence Diagram

sequenceDiagram
    participant Config as Configuration (Hydra)
    participant Main as main()
    participant MLflow as MLflow Service
    participant Split as split_dataset()
    participant Fold as add_folds()
    participant Artifact as Artifact Storage

    Config->>Main: provide config.splits & config.n_folds
    Main->>MLflow: download dataset artifact
    MLflow-->>Main: dataset CSV
    Main->>Split: call split_dataset(dataset, splits)
    Split->>Split: validate ratios & perform stratified group split
    Split-->>Main: return train, test_preliminary, test_final
    Main->>Fold: call add_folds(train, n_folds)
    Fold->>Fold: assign StratifiedGroupKFold 'fold' column
    Fold-->>Main: return train with folds
    Main->>Artifact: log CSV artifacts (train, tests)
    Artifact-->>Main: ack

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Suggested reviewers

vejtek
matejpekar

Poem

🐰 I hopped through configs, splits, and rows,

Folded each case where the stratify grows;
Train, pretest, final — neat in a row,
MLflow caught every hoppy go,
A rabbit cheers for tidy data flows!

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check	❓ Inconclusive	The title 'feat: split' is extremely vague and does not clearly convey what is being implemented, only using the word 'split' without context about dataset splitting, preprocessing, or the actual scope of the changes.	Revise the title to be more specific and descriptive, such as 'feat: add dataset splitting and fold assignment preprocessing' to clearly communicate the main change to reviewers.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feature/split

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist · 2026-02-28T20:35:40Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a robust and configurable system for splitting datasets. It provides a dedicated Python script for performing stratified group-based splits into training, preliminary testing, and final testing sets, along with K-fold cross-validation assignment. The changes also include specific configurations for various datasets and updates to project dependencies to support this new functionality.

Highlights

New Dataset Splitting Functionality: Introduced a new Python script (preprocessing/split_dataset.py) to handle dataset splitting into train, preliminary test, and final test sets, and to assign stratified group K-folds to the training set.
Configurable Dataset Splits: Added new YAML configuration files (ftn.yaml, ikem.yaml, knl_patos.yaml) under configs/experiment/preprocessing/split_dataset to define specific split ratios for different datasets.
Dependency Updates: Updated pyproject.toml and uv.lock to include the ratiopath library and other related packages, which are essential for the new splitting logic.
Job Submission Script: Included a new script (scripts/preprocessing/split_dataset.py) for submitting dataset splitting tasks as Kubernetes jobs.

Changelog

configs/experiment/preprocessing/split_dataset/ftn.yaml
- Added configuration for FTN dataset splitting with defined train, preliminary test, and final test ratios.
configs/experiment/preprocessing/split_dataset/ikem.yaml
- Added configuration for IKEM dataset splitting with defined train, preliminary test, and final test ratios.
configs/experiment/preprocessing/split_dataset/knl_patos.yaml
- Added configuration for KNL_PATOS dataset splitting, specifically for a 0% training set and equal preliminary/final test sets.
configs/preprocessing/split_dataset.yaml
- Added a new base configuration for dataset splitting, defining placeholder split ratios, a default number of folds, and metadata for MLflow logging.
preprocessing/split_dataset.py
- Implemented the core logic for split_dataset to divide data into train, preliminary test, and final test sets based on configurable ratios, ensuring stratification by 'nancy' and grouping by 'case_id'.
- Implemented add_folds to assign stratified group K-folds to the training dataset.
- Integrated Hydra for configuration management and MLflow for artifact logging of the split datasets.
pyproject.toml
- Added ratiopath to the project's core dependencies.
- Included ratiopath as a Git dependency pointing to a specific branch.
scripts/preprocessing/split_dataset.py
- Added a new script to define and submit Kubernetes jobs for dataset splitting, including repository cloning and environment setup.
uv.lock
- Updated the lock file to reflect the addition of ratiopath and its transitive dependencies, including affine, albucore, albumentations, cligj, donfig, geopandas, google-crc32c, imagecodecs, imageio, lazy-loader, numcodecs, opencv-python-headless, pyogrio, pyproj, rasterio, scikit-image, shapely, simsimd, stringzilla, tifffile, and zarr.

Ignored Files

Ignored by pattern: .gemini/** (2)
- .gemini/config.yaml
- .gemini/styleguide.md

Activity

Adames4 created the pull request.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a new preprocessing step for splitting datasets. It includes the Python script to perform the splitting, along with Hydra configuration files for different datasets and a script to run it as a Kubernetes job. The changes are well-structured. My feedback includes a few suggestions for improvement: fixing a typo in the configuration, using a logger instead of print statements for better logging, making the random seed configurable for reproducibility, and addressing a dependency on a feature branch.

configs/preprocessing/split_dataset.yaml

preprocessing/split_dataset.py

pyproject.toml

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (4)

scripts/preprocessing/split_dataset.py (1)

11-14: Pin the cloned repository ref to keep job results reproducible.

Lines [11]-[14] execute against whatever HEAD is at run time. For preprocessing experiments, this makes runs non-deterministic and hard to audit.

Suggested change

-        "git clone https://github.com/RationAI/ulcerative-colitis.git workdir",
+        "git clone https://github.com/RationAI/ulcerative-colitis.git workdir",
+        "cd workdir && git checkout ${GIT_REF}",
-        "cd workdir",
+        "cd workdir",

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@scripts/preprocessing/split_dataset.py` around lines 11 - 14, The git clone
command in scripts/preprocessing/split_dataset.py ("git clone
https://github.com/RationAI/ulcerative-colitis.git workdir") must pin the
repository ref so runs are reproducible; change that command to clone a specific
commit/tag/branch (e.g., use "git clone --branch <TAG_OR_BRANCH> --depth 1 ..."
or clone then "git checkout <COMMIT_SHA>") or replace the URL with a
ref-qualified URL, and ensure the subsequent command ("uv run -m
preprocessing.split_dataset +experiment=preprocessing/split_dataset/...") runs
against that pinned ref; update any CI/env variables if you need to pass the
chosen tag/sha.

preprocessing/split_dataset.py (3)

67-67: Use logger instead of print() for consistency.

The function receives a logger parameter but uses print() for output. This is inconsistent and print output may not be captured in production logging systems or MLflow runs.

♻️ Proposed fix

-    print("whole dataset", dataset["nancy"].value_counts() / len(dataset))
+    logger.log_text(
+        str(dataset["nancy"].value_counts() / len(dataset)),
+        "dataset_distribution.txt"
+    )

Alternatively, if you need console output, consider using Python's logging module configured with appropriate handlers.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@preprocessing/split_dataset.py` at line 67, Replace the direct print call
with the provided logger to ensure consistent logging: in
preprocessing/split_dataset.py where the code prints "whole dataset" (the print
at line showing dataset["nancy"].value_counts() / len(dataset)), change it to
use the passed-in logger (e.g., logger.info or logger.debug) and include the
same message and computed value so output is captured by the app's
logging/MLflow handlers; keep the message content but route it through the
logger used elsewhere in the function that accepts the logger parameter.

20-30: Edge case: train=1.0 bypasses the zero-check and calls train_test_split.

When splits["train"] equals 1.0, the isclose(splits["train"], 0.0) check on line 20 is False, so train_test_split is called with train_size=1.0. Depending on ratiopath's implementation, this may raise an error or produce unexpected results since no samples remain for the test set.

Consider adding an explicit check for train=1.0:

♻️ Proposed fix

-    if isclose(splits["train"], 0.0):
+    if isclose(splits["train"], 0.0):
         train = pd.DataFrame(columns=dataset.columns)
         test = dataset
+    elif isclose(splits["train"], 1.0):
+        train = dataset
+        test = pd.DataFrame(columns=dataset.columns)
     else:
         train, test = train_test_split(

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@preprocessing/split_dataset.py` around lines 20 - 30, The code currently only
checks for train==0.0 but calls train_test_split when splits["train"]==1.0 which
can error because no test samples remain; update the logic around
splits["train"] in preprocessing/split_dataset.py to explicitly handle the edge
case where splits["train"] is close to 1.0 (e.g., isclose(splits["train"], 1.0))
by setting train = dataset and test = empty DataFrame with the same columns
(similarly to the 0.0 branch) instead of calling train_test_split; keep using
the existing symbols splits["train"], train_test_split, dataset, and
dataset.columns so the rest of the code (stratify/groups) is not invoked when
train_size is 1.0.

54-57: Simplify fold assignment indexing.

The current indexing train.loc[train.iloc[val_idx].index, "fold"] is convoluted. Since val_idx contains positional indices, you can use the column position directly with iloc:

♻️ Proposed simplification

     for fold, (_, val_idx) in enumerate(
         splitter.split(train, y=train["nancy"], groups=train["case_id"])
     ):
-        train.loc[train.iloc[val_idx].index, "fold"] = fold
+        train.iloc[val_idx, train.columns.get_loc("fold")] = fold

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@preprocessing/split_dataset.py` around lines 54 - 57, Replace the convoluted
indexing in the fold assignment inside the splitter loop: instead of using
train.loc[train.iloc[val_idx].index, "fold"] = fold, assign directly by
positional indices—use train.iloc with val_idx and the column position for
"fold" (obtainable via train.columns.get_loc("fold")), or alternatively use
train.loc with train.index[val_idx] to set the "fold" column; update the
assignment in the loop that iterates over splitter.split(train,
y=train["nancy"], groups=train["case_id"]) accordingly.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@configs/preprocessing/split_dataset.yaml`:
- Around line 11-12: Update the metadata strings for the dataset splitting job:
replace the misspelled "spliting" with "splitting" in both the run_name and
description entries (symbols: run_name, description) so they read "Dataset
splitting: ${dataset.institution}" and "Dataset splitting for
${dataset.institution} dataset."

In `@preprocessing/split_dataset.py`:
- Line 18: Replace the runtime assertion in split_dataset.py with explicit
validation: instead of using assert isclose(sum(splits.values()), 1.0), compute
total = sum(splits.values()) and if not math.isclose(total, 1.0): raise
ValueError(f"Splits must sum to 1.0; got {total}") (ensure math.isclose is
imported); update any surrounding code in the function that depends on the
assert to use this explicit check for production-safe validation.

In `@scripts/preprocessing/split_dataset.py`:
- Around line 5-14: The job definition contains unresolved placeholders—replace
the Ellipsis used for username and the "..." in job_name and the uv run
+experiment path with real values or make them parameters; specifically update
the job_name, username, and the script entry that runs "uv run -m
preprocessing.split_dataset +experiment=preprocessing/split_dataset/..." so they
accept variables or environment args (e.g., CLI flags or template variables) and
validate them before submission so the script and uv command are executable
(look for the job_name, username, and the script list entries to implement the
parameterization and validation).

---

Nitpick comments:
In `@preprocessing/split_dataset.py`:
- Line 67: Replace the direct print call with the provided logger to ensure
consistent logging: in preprocessing/split_dataset.py where the code prints
"whole dataset" (the print at line showing dataset["nancy"].value_counts() /
len(dataset)), change it to use the passed-in logger (e.g., logger.info or
logger.debug) and include the same message and computed value so output is
captured by the app's logging/MLflow handlers; keep the message content but
route it through the logger used elsewhere in the function that accepts the
logger parameter.
- Around line 20-30: The code currently only checks for train==0.0 but calls
train_test_split when splits["train"]==1.0 which can error because no test
samples remain; update the logic around splits["train"] in
preprocessing/split_dataset.py to explicitly handle the edge case where
splits["train"] is close to 1.0 (e.g., isclose(splits["train"], 1.0)) by setting
train = dataset and test = empty DataFrame with the same columns (similarly to
the 0.0 branch) instead of calling train_test_split; keep using the existing
symbols splits["train"], train_test_split, dataset, and dataset.columns so the
rest of the code (stratify/groups) is not invoked when train_size is 1.0.
- Around line 54-57: Replace the convoluted indexing in the fold assignment
inside the splitter loop: instead of using train.loc[train.iloc[val_idx].index,
"fold"] = fold, assign directly by positional indices—use train.iloc with
val_idx and the column position for "fold" (obtainable via
train.columns.get_loc("fold")), or alternatively use train.loc with
train.index[val_idx] to set the "fold" column; update the assignment in the loop
that iterates over splitter.split(train, y=train["nancy"],
groups=train["case_id"]) accordingly.

In `@scripts/preprocessing/split_dataset.py`:
- Around line 11-14: The git clone command in
scripts/preprocessing/split_dataset.py ("git clone
https://github.com/RationAI/ulcerative-colitis.git workdir") must pin the
repository ref so runs are reproducible; change that command to clone a specific
commit/tag/branch (e.g., use "git clone --branch <TAG_OR_BRANCH> --depth 1 ..."
or clone then "git checkout <COMMIT_SHA>") or replace the URL with a
ref-qualified URL, and ensure the subsequent command ("uv run -m
preprocessing.split_dataset +experiment=preprocessing/split_dataset/...") runs
against that pinned ref; update any CI/env variables if you need to pass the
chosen tag/sha.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e911e24 and cc509c5.

⛔ Files ignored due to path filters (1)

uv.lock is excluded by !**/*.lock

📒 Files selected for processing (10)

.gemini/config.yaml
.gemini/styleguide.md
configs/experiment/.gitkeep
configs/experiment/preprocessing/split_dataset/ftn.yaml
configs/experiment/preprocessing/split_dataset/ikem.yaml
configs/experiment/preprocessing/split_dataset/knl_patos.yaml
configs/preprocessing/split_dataset.yaml
preprocessing/split_dataset.py
pyproject.toml
scripts/preprocessing/split_dataset.py

configs/preprocessing/split_dataset.yaml

preprocessing/split_dataset.py

scripts/preprocessing/split_dataset.py

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@preprocessing/split_dataset.py`:
- Line 36: The computation of preliminary_size uses preliminary_size =
splits["test_preliminary"] / (1.0 - splits["train"]) which can blow up when
splits["train"] is ≈1.0; add a small epsilon guard for the denominator (e.g.,
denom = max(1e-8, 1.0 - splits["train"])) and then clamp preliminary_size to the
valid range [0.0, 1.0] (e.g., preliminary_size = max(0.0, min(preliminary_size,
1.0))); apply these changes where preliminary_size is computed and before
calling train_test_split so train_test_split receives a valid fraction.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cc509c5 and 9b2ba79.

📒 Files selected for processing (2)

configs/preprocessing/split_dataset.yaml
preprocessing/split_dataset.py

🚧 Files skipped from review as they are similar to previous changes (1)

configs/preprocessing/split_dataset.yaml

preprocessing/split_dataset.py

.gemini/styleguide.md

configs/experiment/.gitkeep

.gemini/styleguide.md

coderabbitai

🧹 Nitpick comments (1)

.gemini/styleguide.md (1)

28-34: Tighten path naming and script-review wording to avoid ambiguity.

project_name/ looks placeholder-like in a repo-specific styleguide, and Line 33 could be read as relaxing correctness/security checks for scripts. Consider clarifying both.

Proposed wording refinement

-- **Experiment Tracking (MLflow):** When PRs add new loss functions, evaluation metrics, or training loops in `project_name/` (or `ml/`), ensure that these new metrics are properly logged to MLflow.
+- **Experiment Tracking (MLflow):** When PRs add new loss functions, evaluation metrics, or training loops in the repository’s model/training package (currently `ml/` when introduced), ensure these metrics are logged to MLflow.

-  - `project_name/` (future `ml/`): Focus on training loops, PyTorch Lightning modules, and model definitions.
+  - `ml/` (when introduced): Focus on training loops, PyTorch Lightning modules, and model definitions.
   - `postprocessing/`: Focus on ensembling and final prediction logic.
-  - `scripts/`: These are job submission templates. Do not review them as strictly as core Python code.
+  - `scripts/`: These are job submission templates. Prioritize correctness, reproducibility, and safety; be lighter only on style-level nits.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In @.gemini/styleguide.md around lines 28 - 34, The wording is ambiguous:
replace the vague placeholder `project_name/` and the phrase about `scripts/`
being "not reviewed as strictly" with clear, repo-specific guidance—use the
intended directory name (e.g., `ml/`) or state “(replace with repo ML package
name)” where `project_name/` appears, and change the `scripts/` line to clarify
they are job submission templates that may have relaxed style requirements but
still must meet security and correctness checks; update the lines under
"Experiment Tracking (MLflow):" and "Repository Structure:" to mention `ml/` (or
an explicit repo ML package) and to state that `scripts/` are exempt from some
style reviews but are still subject to security/correctness validation.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In @.gemini/styleguide.md:
- Around line 28-34: The wording is ambiguous: replace the vague placeholder
`project_name/` and the phrase about `scripts/` being "not reviewed as strictly"
with clear, repo-specific guidance—use the intended directory name (e.g., `ml/`)
or state “(replace with repo ML package name)” where `project_name/` appears,
and change the `scripts/` line to clarify they are job submission templates that
may have relaxed style requirements but still must meet security and correctness
checks; update the lines under "Experiment Tracking (MLflow):" and "Repository
Structure:" to mention `ml/` (or an explicit repo ML package) and to state that
`scripts/` are exempt from some style reviews but are still subject to
security/correctness validation.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 82159faf-d14e-452b-a680-2bb3a986c0ef

📥 Commits

Reviewing files that changed from the base of the PR and between 9b2ba79 and 7e8eb42.

⛔ Files ignored due to path filters (1)

uv.lock is excluded by !**/*.lock

📒 Files selected for processing (2)

.gemini/styleguide.md
pyproject.toml

🚧 Files skipped from review as they are similar to previous changes (1)

pyproject.toml

Adames4 added 4 commits February 28, 2026 20:29

chore: dependencies

b3b0e33

chore: gemini

9defb41

feat: split

387737f

fix: typos

cc509c5

Adames4 self-assigned this Feb 28, 2026

Adames4 requested review from a team, JakubPekar and matejpekar February 28, 2026 20:35

gemini-code-assist bot reviewed Feb 28, 2026

View reviewed changes

configs/preprocessing/split_dataset.yaml Outdated Show resolved Hide resolved

preprocessing/split_dataset.py Show resolved Hide resolved

preprocessing/split_dataset.py Outdated Show resolved Hide resolved

pyproject.toml Outdated Show resolved Hide resolved

Adames4 removed request for JakubPekar and matejpekar February 28, 2026 20:38

coderabbitai bot reviewed Feb 28, 2026

View reviewed changes

configs/preprocessing/split_dataset.yaml Outdated Show resolved Hide resolved

preprocessing/split_dataset.py Show resolved Hide resolved

scripts/preprocessing/split_dataset.py Show resolved Hide resolved

Adames4 requested a review from vejtek February 28, 2026 20:51

fix: PR

9b2ba79

coderabbitai bot reviewed Mar 1, 2026

View reviewed changes

preprocessing/split_dataset.py Show resolved Hide resolved

vejtek requested a review from matejpekar March 3, 2026 10:40

matejpekar requested changes Mar 4, 2026

View reviewed changes

.gemini/styleguide.md Outdated Show resolved Hide resolved

configs/experiment/.gitkeep Outdated Show resolved Hide resolved

vejtek requested changes Mar 4, 2026

View reviewed changes

.gemini/styleguide.md Show resolved Hide resolved

fix: PR

7e8eb42

Adames4 requested review from matejpekar and vejtek March 5, 2026 07:36

coderabbitai bot reviewed Mar 5, 2026

View reviewed changes

matejpekar approved these changes Mar 5, 2026

View reviewed changes

vejtek approved these changes Mar 6, 2026

View reviewed changes

vejtek merged commit b2d547e into master Mar 6, 2026
3 checks passed

vejtek deleted the feature/split branch March 6, 2026 14:52

Conversation

Adames4 commented Feb 28, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Suggested reviewers

Poem

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

gemini-code-assist bot commented Feb 28, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Adames4 commented Feb 28, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 28, 2026 •

edited

Loading