add sample_key kwarg, changed sample_key and batch_key logic by AdnanAbouelela · Pull Request #10 · quadbio/scembed

AdnanAbouelela · 2025-10-22T12:13:55Z

No description provided.

for more information, see https://pre-commit.ci

Marius1311

Thanks for splitting sample_key out from batch_key — that's a real correctness fix for spatial data where slice ≠ batch. A few things before this can land; left inline comments at the specific lines.

Top-level summary

Blocking:

CI is red on hatch-test.py3.11 / py3.14: every test_scviva_method parametrization now fails with KeyError: None. Root cause: the existing test calls scVIVAMethod(...) without sample_key, so it defaults to None, which then flows into scvi.external.SCVIVA.setup_anndata(sample_key=None) and explodes. (tests/test_retrival.py::test_embedding_retrieval[method_config4] also fails for the same reason.) The Scanorama worker-crash failures look unrelated.
This is a silent behavior change vs. main: previously sample_key defaulted to self.batch_key; now it defaults to None. Users who upgrade and don't pass sample_key get a hard break instead of the old behavior. Either keep None-falls-back-to-batch_key semantics, or make this explicit and update the docstring + tests.

Other items (inline): test coverage, docstring, PR description.

Documentation Impact

No AGENTS.md invariant changes — BaseIntegrationMethod contract is preserved (the kwarg lives on scVIVAMethod and is forwarded via **kwargs into self.params like the other scVIVA-specific kwargs). README.md and docs/ don't mention sample_key so nothing stale there.

A short PR body explaining why sample_key should be distinct from batch_key (spatial slices vs. experimental batches) would help future readers — the body is currently empty.

Marius1311 · 2026-04-30T15:41:11Z

        batch_size: int | None = None,
        gene_likelihood: str | None = None,
        check_val_every_n_epoch: int | None = None,
+        sample_key: str | None = None,


Behavior break vs. main. Previously sample_key was implicitly self.batch_key; now it defaults to None, which propagates through to scvi.external.SCVIVA.setup_anndata(sample_key=None) and raises KeyError: None (this is what's failing in CI for all three test_scviva_method parametrizations and test_embedding_retrieval[method_config4]).

Two reasonable options:

Preserve old default — keep the kwarg, but fall back to batch_key when None:

self.sample_key = sample_key if sample_key is not None else batch_key

This way existing users keep working; new users can override.

Make sample_key required-ish — keep None default but raise a clear error in fit() / setup() if it's still None at use time, and update the existing test_scviva_method to pass sample_key="batch" explicitly. Document the change as breaking in the PR body / CHANGELOG.

Either is fine, but the current state silently breaks every existing caller.

Marius1311 · 2026-04-30T15:41:11Z

        check_val_every_n_epoch
            Check validation loss every n epochs.
+        sample_key
+            Key in adata.obs indicating different slices/sections.


Docstring should also state what None means and how this differs from batch_key. Suggested:

sample_key Key in ``adata.obs`` indicating different physical slices/sections (analogous to ``slice_key`` in ResolVI). Distinct from ``batch_key``, which can encode any experimental batch dimension. If ``None``, ... [whatever the chosen semantic is].

Marius1311 · 2026-04-30T15:41:11Z

            layer=self.counts_layer,
            batch_key=self.batch_key,
-            sample_key=self.batch_key,
+            sample_key=self.sample_key,  # like slice_key in ResolVI


Two small things on this line:

The # like slice_key in ResolVI comment is more useful in the docstring than at the call site — that's where users will look. Consider promoting it.

With the current None default, this call hands None to setup_anndata, which is exactly what blows up in CI. Whatever fix is chosen for the default (see comment on line 937), the same value should also flow into preprocessing_anndata at line 1046 — they need to stay in sync.

Marius1311 · 2026-04-30T15:41:11Z


        # Prepare preprocessing parameters, filtering out None values for k_nn only
+        # preprocessing_params are passed to preprocessing_anndata and setup_anndata
+        # as described in the tutorial


Per REVIEW_GUIDE.md (changed-path test lookup): methods/gpu_methods.py → tests/methods/test_gpu_methods.py. The existing test_scviva_method doesn't cover the new sample_key parameter at all (and currently fails because of the new default — see line 937).

Once the default behavior is settled, please add a parametrize row exercising sample_key distinct from batch_key so the new functionality has a regression test. The existing fixture spatial_data should already have a usable obs column; reuse it rather than building a parallel fixture.

AdnanAbouelela and others added 2 commits April 30, 2026 17:26

add sample_key kwarg, changed sample_key and batch_key logic

390e9e3

[pre-commit.ci] auto fixes from pre-commit.com hooks

f3ab57c

for more information, see https://pre-commit.ci

Marius1311 force-pushed the scviva branch from 0ebc0f9 to f3ab57c Compare April 30, 2026 15:26

Marius1311 reviewed Apr 30, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add sample_key kwarg, changed sample_key and batch_key logic#10

add sample_key kwarg, changed sample_key and batch_key logic#10
AdnanAbouelela wants to merge 2 commits intomainfrom
scviva

AdnanAbouelela commented Oct 22, 2025

Uh oh!

Marius1311 left a comment

Uh oh!

Marius1311 Apr 30, 2026

Uh oh!

Marius1311 Apr 30, 2026

Uh oh!

Marius1311 Apr 30, 2026

Uh oh!

Marius1311 Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AdnanAbouelela commented Oct 22, 2025

Uh oh!

Marius1311 left a comment

Choose a reason for hiding this comment

Top-level summary

Documentation Impact

Uh oh!

Marius1311 Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Marius1311 Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Marius1311 Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Marius1311 Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants