add sample_key kwarg, changed sample_key and batch_key logic#10
add sample_key kwarg, changed sample_key and batch_key logic#10AdnanAbouelela wants to merge 2 commits intomainfrom
Conversation
Marius1311
left a comment
There was a problem hiding this comment.
Thanks for splitting sample_key out from batch_key — that's a real correctness fix for spatial data where slice ≠ batch. A few things before this can land; left inline comments at the specific lines.
Top-level summary
Blocking:
- CI is red on
hatch-test.py3.11/py3.14: everytest_scviva_methodparametrization now fails withKeyError: None. Root cause: the existing test callsscVIVAMethod(...)withoutsample_key, so it defaults toNone, which then flows intoscvi.external.SCVIVA.setup_anndata(sample_key=None)and explodes. (tests/test_retrival.py::test_embedding_retrieval[method_config4]also fails for the same reason.) The Scanorama worker-crash failures look unrelated. - This is a silent behavior change vs. main: previously
sample_keydefaulted toself.batch_key; now it defaults toNone. Users who upgrade and don't passsample_keyget a hard break instead of the old behavior. Either keepNone-falls-back-to-batch_keysemantics, or make this explicit and update the docstring + tests.
Other items (inline): test coverage, docstring, PR description.
Documentation Impact
No AGENTS.md invariant changes — BaseIntegrationMethod contract is preserved (the kwarg lives on scVIVAMethod and is forwarded via **kwargs into self.params like the other scVIVA-specific kwargs). README.md and docs/ don't mention sample_key so nothing stale there.
A short PR body explaining why sample_key should be distinct from batch_key (spatial slices vs. experimental batches) would help future readers — the body is currently empty.
| batch_size: int | None = None, | ||
| gene_likelihood: str | None = None, | ||
| check_val_every_n_epoch: int | None = None, | ||
| sample_key: str | None = None, |
There was a problem hiding this comment.
Behavior break vs. main. Previously sample_key was implicitly self.batch_key; now it defaults to None, which propagates through to scvi.external.SCVIVA.setup_anndata(sample_key=None) and raises KeyError: None (this is what's failing in CI for all three test_scviva_method parametrizations and test_embedding_retrieval[method_config4]).
Two reasonable options:
-
Preserve old default — keep the kwarg, but fall back to
batch_keywhenNone:self.sample_key = sample_key if sample_key is not None else batch_key
This way existing users keep working; new users can override.
-
Make
sample_keyrequired-ish — keepNonedefault but raise a clear error infit()/setup()if it's stillNoneat use time, and update the existingtest_scviva_methodto passsample_key="batch"explicitly. Document the change as breaking in the PR body / CHANGELOG.
Either is fine, but the current state silently breaks every existing caller.
| check_val_every_n_epoch | ||
| Check validation loss every n epochs. | ||
| sample_key | ||
| Key in adata.obs indicating different slices/sections. |
There was a problem hiding this comment.
Docstring should also state what None means and how this differs from batch_key. Suggested:
sample_key
Key in ``adata.obs`` indicating different physical slices/sections (analogous to
``slice_key`` in ResolVI). Distinct from ``batch_key``, which can encode any
experimental batch dimension. If ``None``, ... [whatever the chosen semantic is].
| layer=self.counts_layer, | ||
| batch_key=self.batch_key, | ||
| sample_key=self.batch_key, | ||
| sample_key=self.sample_key, # like slice_key in ResolVI |
There was a problem hiding this comment.
Two small things on this line:
- The
# like slice_key in ResolVIcomment is more useful in the docstring than at the call site — that's where users will look. Consider promoting it. - With the current
Nonedefault, this call handsNonetosetup_anndata, which is exactly what blows up in CI. Whatever fix is chosen for the default (see comment on line 937), the same value should also flow intopreprocessing_anndataat line 1046 — they need to stay in sync.
|
|
||
| # Prepare preprocessing parameters, filtering out None values for k_nn only | ||
| # preprocessing_params are passed to preprocessing_anndata and setup_anndata | ||
| # as described in the tutorial |
There was a problem hiding this comment.
Per REVIEW_GUIDE.md (changed-path test lookup): methods/gpu_methods.py → tests/methods/test_gpu_methods.py. The existing test_scviva_method doesn't cover the new sample_key parameter at all (and currently fails because of the new default — see line 937).
Once the default behavior is settled, please add a parametrize row exercising sample_key distinct from batch_key so the new functionality has a regression test. The existing fixture spatial_data should already have a usable obs column; reuse it rather than building a parallel fixture.
No description provided.