Skip to content

chore: groupby warn and add tests for warnings#195

Open
selmanozleyen wants to merge 8 commits into
scverse:mainfrom
selmanozleyen:feat/issue-193-categorical-validation
Open

chore: groupby warn and add tests for warnings#195
selmanozleyen wants to merge 8 commits into
scverse:mainfrom
selmanozleyen:feat/issue-193-categorical-validation

Conversation

@selmanozleyen
Copy link
Copy Markdown
Member

fixes #193

@selmanozleyen selmanozleyen marked this pull request as draft April 17, 2026 15:45
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 19, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 91.69%. Comparing base (9dc1ba2) to head (223d095).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #195      +/-   ##
==========================================
- Coverage   93.48%   91.69%   -1.80%     
==========================================
  Files          14       14              
  Lines        1121     1132      +11     
==========================================
- Hits         1048     1038      -10     
- Misses         73       94      +21     
Files with missing lines Coverage Δ
src/annbatch/io.py 91.81% <100.00%> (+0.48%) ⬆️

... and 3 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@selmanozleyen selmanozleyen self-assigned this Apr 19, 2026
@selmanozleyen selmanozleyen added the skip-gpu-ci Whether gpu ci should be skipped label Apr 19, 2026
@selmanozleyen selmanozleyen marked this pull request as ready for review April 19, 2026 19:28
@selmanozleyen
Copy link
Copy Markdown
Member Author

@ilan-gold is there anything else you wanted? for example we still fail when outer introduces a column if it's specified in groupby. But warn when it's not

@selmanozleyen selmanozleyen requested a review from ilan-gold April 19, 2026 19:29
Copy link
Copy Markdown
Collaborator

@ilan-gold ilan-gold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I think these zarr warnings are silenced globally, no? tool.pytest.ini_options.filterwarnings?
  2. Could you write up what behavior your are encoding here? It seems like the point is to warn on mismatched obs columns.

More importantly, can we add a test for what happens to the actual on-disk data when you add either

a. A new categorical column to an existing store - do all the stores get properly updated to have the new column with the right category?
b. A categorical column with the same name but different categories to an existing stores - again, are all of the newly shuffled datasets' altered column of the same dtype i.e., do they have the same codes (which is really what matters)?

Comment thread src/annbatch/io.py
Comment on lines 286 to 288
for key in curr_keys:
if not (elem_name in {"var", "obs"} and key == "_index"):
key_count[key] += 1
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can just get away with adding obs and var to this check. Why not? It would would go beyond categoricals, but I'm not sure that is so bad. I am not sure why obs and var are excluded. This check is relatively naive as well so we could also add some behavior around e.g., mismatched dtypes. But that appears out-of-scope for this PR, aside from maybe just adding the special warning about categoricals.

Comment thread tests/test_preshuffle.py
Comment on lines +53 to +54
with warnings.catch_warnings(record=True) as caught_warnings:
warnings.simplefilter("always")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this just blanket ignoring warnings? Seems bad

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

skip-gpu-ci Whether gpu ci should be skipped

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Validate categorical consistency across all obs columns when building or extending DatasetCollection

2 participants