add option to filter runs#17
Conversation
for more information, see https://pre-commit.ci
119a1d8 to
9d7ea37
Compare
Marius1311
left a comment
There was a problem hiding this comment.
Summary
Useful feature — being able to slice the WandB sweep down to runs matching specific config values is a real gap in the aggregator API. The implementation is straightforward and lands in the right place (fetch_runs → _filter_params → _process_runs).
A few items below before this can merge. CI red is the blocker; the rest are review-level questions.
Blocking
1. CI is red — 3 ruff errors introduced by this PR
pre-commit.ci fails on ruff check. Local reproduction:
UP007 src/scembed/aggregation.py:21 — use `X | Y` for type annotations
UP007 src/scembed/aggregation.py:22 — use `X | Y` for type annotations
C419 src/scembed/aggregation.py:195 — unnecessary list comprehension inside any()
Suggested patch:
-from typing import Union
-...
-# Allowed scalar types for filtering
-AllowedScalar = Union[int, float, str, bool]
-AllowedFilterValue = Union[AllowedScalar, list[AllowedScalar]]
+# Allowed scalar types for filtering
+AllowedScalar = int | float | str | bool
+AllowedFilterValue = AllowedScalar | list[AllowedScalar]- if any([param_value == x for x in target_value]):
+ if any(param_value == x for x in target_value):(Or just if param_value in target_value: — semantically identical and simpler.)
2. No tests for the new filter behavior
Per REVIEW_GUIDE.md (Testing → New code): new behavior must be covered. tests/test_aggregation.py has no test that exercises filter_params / filter_allow_none.
The existing tests hit a live W&B project, which makes adding a hermetic test for _filter_params a bit awkward — but the method itself is pure: it operates on self.raw_df. A direct unit test is easy:
def test_filter_params_scalar_and_list(self):
self.agg.raw_df = pd.DataFrame({
"config": [{"method": "scvi"}, {"method": "harmony"}, {"method": None}, None],
})
self.agg.filter_allow_none = True
self.agg._filter_params("method", ["scvi", "harmony"])
assert len(self.agg.raw_df) == 3 # two matches + one None kept
# ... and a filter_allow_none=False variant, plus a scalar target_value caseA parametrize with (target_value, filter_allow_none, expected_n) rows would cover the four meaningful branches in one test.
Review-level questions
3. _filter_params lacks type hints
Public fetch_runs is fully typed but _filter_params(self, target_key, target_value) is bare. It is private, but for consistency:
def _filter_params(self, target_key: str, target_value: AllowedFilterValue) -> None:4. Implicit "rows with no config" behavior
config = row.get("config")
if config is None or not isinstance(config, dict):
continueRows with missing/malformed config are silently dropped — they end up in neither keep_runs nor discard_runs. The comment says "Missing config runs will be filtered out later in _process_runs()", which is correct for the n_runs_filtered counter, but the in-loop continue here means those rows are not in self.raw_df after the loc[keep_runs] reassignment. So _process_runs will not see them and will not bump missing_config_runs.
Two options:
- Append to
keep_runs(let_process_runshandle it as before — preserves the existing accounting). - Append to
discard_runsand add them toself.missing_config_runshere.
Right now it's a third hidden path. Worth choosing one explicitly.
5. State leak via self.filter_allow_none
fetch_runs writes self.filter_params / self.filter_allow_none and _filter_params reads self.filter_allow_none. Passing filter_allow_none as an argument to _filter_params would keep _filter_params self-contained and avoid leaving filter state on the instance after fetch (relevant if fetch_runs is called twice with different filters).
def _filter_params(self, target_key, target_value, *, filter_allow_none: bool) -> None:6. Public API surface
AllowedScalar / AllowedFilterValue end up on the public fetch_runs signature, which makes them effectively part of the public API even though they are not re-exported from __init__.py. Per AGENTS.md (public-API guidance) this is fine — just flagging it so the names are intentional. AllowedFilterValue is reasonable; consider whether AllowedScalar needs to exist at all (it's only used inside AllowedFilterValue).
7. Docstring
fetch_runs docstring is a single line and does not document the two new parameters. A short numpydoc block describing filter_params (mapping of config key → allowed value or list of values) and filter_allow_none (whether runs with None/missing values for the filtered key are kept) would help users discover the feature.
Non-blocking nits
discard_runsis collected only to loglen(discard_runs). A counter is enough; the list is unused.if not isinstance(target_value, list): ... else: ...could be unified by normalizingtarget_valueto a list at entry:targets = target_value if isinstance(target_value, list) else [target_value], then a singleincheck.- The type alias allows
bool, butboolis a subclass ofint— passingTrueagainst an int-typed config field will match1. Probably fine in practice; just noting.
Documentation Impact
No README.md, docs/api.md, or notebook updates needed — fetch_runs is already in the autosummary via scIBAggregator, and the new kwargs surface there once the docstring (point 7) is filled in.
No AGENTS.md invariant is touched (this respects the "W&B SDK drift isolated in aggregation.py helpers" rule — filtering is layered on top, not inside the SDK helpers). Good.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #17 +/- ##
==========================================
- Coverage 75.74% 74.28% -1.46%
==========================================
Files 11 11
Lines 1303 1338 +35
==========================================
+ Hits 987 994 +7
- Misses 316 344 +28
🚀 New features to boost your workflow:
|
This PR is introducing an option to filter runs to meet certain criteria. This way one can restrict the benchmarking to only tools that are label aware vs. unaware etc.