Skip to content

add sample_key kwarg, changed sample_key and batch_key logic#10

Open
AdnanAbouelela wants to merge 2 commits intomainfrom
scviva
Open

add sample_key kwarg, changed sample_key and batch_key logic#10
AdnanAbouelela wants to merge 2 commits intomainfrom
scviva

Conversation

@AdnanAbouelela
Copy link
Copy Markdown
Collaborator

No description provided.

Copy link
Copy Markdown
Member

@Marius1311 Marius1311 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for splitting sample_key out from batch_key — that's a real correctness fix for spatial data where slice ≠ batch. A few things before this can land; left inline comments at the specific lines.

Top-level summary

Blocking:

  • CI is red on hatch-test.py3.11 / py3.14: every test_scviva_method parametrization now fails with KeyError: None. Root cause: the existing test calls scVIVAMethod(...) without sample_key, so it defaults to None, which then flows into scvi.external.SCVIVA.setup_anndata(sample_key=None) and explodes. (tests/test_retrival.py::test_embedding_retrieval[method_config4] also fails for the same reason.) The Scanorama worker-crash failures look unrelated.
  • This is a silent behavior change vs. main: previously sample_key defaulted to self.batch_key; now it defaults to None. Users who upgrade and don't pass sample_key get a hard break instead of the old behavior. Either keep None-falls-back-to-batch_key semantics, or make this explicit and update the docstring + tests.

Other items (inline): test coverage, docstring, PR description.

Documentation Impact

No AGENTS.md invariant changes — BaseIntegrationMethod contract is preserved (the kwarg lives on scVIVAMethod and is forwarded via **kwargs into self.params like the other scVIVA-specific kwargs). README.md and docs/ don't mention sample_key so nothing stale there.

A short PR body explaining why sample_key should be distinct from batch_key (spatial slices vs. experimental batches) would help future readers — the body is currently empty.

batch_size: int | None = None,
gene_likelihood: str | None = None,
check_val_every_n_epoch: int | None = None,
sample_key: str | None = None,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Behavior break vs. main. Previously sample_key was implicitly self.batch_key; now it defaults to None, which propagates through to scvi.external.SCVIVA.setup_anndata(sample_key=None) and raises KeyError: None (this is what's failing in CI for all three test_scviva_method parametrizations and test_embedding_retrieval[method_config4]).

Two reasonable options:

  1. Preserve old default — keep the kwarg, but fall back to batch_key when None:

    self.sample_key = sample_key if sample_key is not None else batch_key

    This way existing users keep working; new users can override.

  2. Make sample_key required-ish — keep None default but raise a clear error in fit() / setup() if it's still None at use time, and update the existing test_scviva_method to pass sample_key="batch" explicitly. Document the change as breaking in the PR body / CHANGELOG.

Either is fine, but the current state silently breaks every existing caller.

check_val_every_n_epoch
Check validation loss every n epochs.
sample_key
Key in adata.obs indicating different slices/sections.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docstring should also state what None means and how this differs from batch_key. Suggested:

sample_key
    Key in ``adata.obs`` indicating different physical slices/sections (analogous to
    ``slice_key`` in ResolVI). Distinct from ``batch_key``, which can encode any
    experimental batch dimension. If ``None``, ... [whatever the chosen semantic is].

layer=self.counts_layer,
batch_key=self.batch_key,
sample_key=self.batch_key,
sample_key=self.sample_key, # like slice_key in ResolVI
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two small things on this line:

  1. The # like slice_key in ResolVI comment is more useful in the docstring than at the call site — that's where users will look. Consider promoting it.
  2. With the current None default, this call hands None to setup_anndata, which is exactly what blows up in CI. Whatever fix is chosen for the default (see comment on line 937), the same value should also flow into preprocessing_anndata at line 1046 — they need to stay in sync.


# Prepare preprocessing parameters, filtering out None values for k_nn only
# preprocessing_params are passed to preprocessing_anndata and setup_anndata
# as described in the tutorial
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per REVIEW_GUIDE.md (changed-path test lookup): methods/gpu_methods.pytests/methods/test_gpu_methods.py. The existing test_scviva_method doesn't cover the new sample_key parameter at all (and currently fails because of the new default — see line 937).

Once the default behavior is settled, please add a parametrize row exercising sample_key distinct from batch_key so the new functionality has a regression test. The existing fixture spatial_data should already have a usable obs column; reuse it rather than building a parallel fixture.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants