Skip to content

add option to filter runs#17

Open
AdnanAbouelela wants to merge 2 commits intomainfrom
feat/filter_runs
Open

add option to filter runs#17
AdnanAbouelela wants to merge 2 commits intomainfrom
feat/filter_runs

Conversation

@AdnanAbouelela
Copy link
Copy Markdown
Collaborator

This PR is introducing an option to filter runs to meet certain criteria. This way one can restrict the benchmarking to only tools that are label aware vs. unaware etc.

Copy link
Copy Markdown
Member

@Marius1311 Marius1311 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Useful feature — being able to slice the WandB sweep down to runs matching specific config values is a real gap in the aggregator API. The implementation is straightforward and lands in the right place (fetch_runs_filter_params_process_runs).

A few items below before this can merge. CI red is the blocker; the rest are review-level questions.

Blocking

1. CI is red — 3 ruff errors introduced by this PR

pre-commit.ci fails on ruff check. Local reproduction:

UP007 src/scembed/aggregation.py:21  — use `X | Y` for type annotations
UP007 src/scembed/aggregation.py:22  — use `X | Y` for type annotations
C419  src/scembed/aggregation.py:195 — unnecessary list comprehension inside any()

Suggested patch:

-from typing import Union
-...
-# Allowed scalar types for filtering
-AllowedScalar = Union[int, float, str, bool]
-AllowedFilterValue = Union[AllowedScalar, list[AllowedScalar]]
+# Allowed scalar types for filtering
+AllowedScalar = int | float | str | bool
+AllowedFilterValue = AllowedScalar | list[AllowedScalar]
-                if any([param_value == x for x in target_value]):
+                if any(param_value == x for x in target_value):

(Or just if param_value in target_value: — semantically identical and simpler.)

2. No tests for the new filter behavior

Per REVIEW_GUIDE.md (Testing → New code): new behavior must be covered. tests/test_aggregation.py has no test that exercises filter_params / filter_allow_none.

The existing tests hit a live W&B project, which makes adding a hermetic test for _filter_params a bit awkward — but the method itself is pure: it operates on self.raw_df. A direct unit test is easy:

def test_filter_params_scalar_and_list(self):
    self.agg.raw_df = pd.DataFrame({
        "config": [{"method": "scvi"}, {"method": "harmony"}, {"method": None}, None],
    })
    self.agg.filter_allow_none = True
    self.agg._filter_params("method", ["scvi", "harmony"])
    assert len(self.agg.raw_df) == 3  # two matches + one None kept

    # ... and a filter_allow_none=False variant, plus a scalar target_value case

A parametrize with (target_value, filter_allow_none, expected_n) rows would cover the four meaningful branches in one test.

Review-level questions

3. _filter_params lacks type hints

Public fetch_runs is fully typed but _filter_params(self, target_key, target_value) is bare. It is private, but for consistency:

def _filter_params(self, target_key: str, target_value: AllowedFilterValue) -> None:

4. Implicit "rows with no config" behavior

config = row.get("config")
if config is None or not isinstance(config, dict):
    continue

Rows with missing/malformed config are silently dropped — they end up in neither keep_runs nor discard_runs. The comment says "Missing config runs will be filtered out later in _process_runs()", which is correct for the n_runs_filtered counter, but the in-loop continue here means those rows are not in self.raw_df after the loc[keep_runs] reassignment. So _process_runs will not see them and will not bump missing_config_runs.

Two options:

  • Append to keep_runs (let _process_runs handle it as before — preserves the existing accounting).
  • Append to discard_runs and add them to self.missing_config_runs here.

Right now it's a third hidden path. Worth choosing one explicitly.

5. State leak via self.filter_allow_none

fetch_runs writes self.filter_params / self.filter_allow_none and _filter_params reads self.filter_allow_none. Passing filter_allow_none as an argument to _filter_params would keep _filter_params self-contained and avoid leaving filter state on the instance after fetch (relevant if fetch_runs is called twice with different filters).

def _filter_params(self, target_key, target_value, *, filter_allow_none: bool) -> None:

6. Public API surface

AllowedScalar / AllowedFilterValue end up on the public fetch_runs signature, which makes them effectively part of the public API even though they are not re-exported from __init__.py. Per AGENTS.md (public-API guidance) this is fine — just flagging it so the names are intentional. AllowedFilterValue is reasonable; consider whether AllowedScalar needs to exist at all (it's only used inside AllowedFilterValue).

7. Docstring

fetch_runs docstring is a single line and does not document the two new parameters. A short numpydoc block describing filter_params (mapping of config key → allowed value or list of values) and filter_allow_none (whether runs with None/missing values for the filtered key are kept) would help users discover the feature.

Non-blocking nits

  • discard_runs is collected only to log len(discard_runs). A counter is enough; the list is unused.
  • if not isinstance(target_value, list): ... else: ... could be unified by normalizing target_value to a list at entry: targets = target_value if isinstance(target_value, list) else [target_value], then a single in check.
  • The type alias allows bool, but bool is a subclass of int — passing True against an int-typed config field will match 1. Probably fine in practice; just noting.

Documentation Impact

No README.md, docs/api.md, or notebook updates needed — fetch_runs is already in the autosummary via scIBAggregator, and the new kwargs surface there once the docstring (point 7) is filled in.

No AGENTS.md invariant is touched (this respects the "W&B SDK drift isolated in aggregation.py helpers" rule — filtering is layered on top, not inside the SDK helpers). Good.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 30, 2026

Codecov Report

❌ Patch coverage is 22.22222% with 28 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.28%. Comparing base (5e3d37d) to head (9d7ea37).

Files with missing lines Patch % Lines
src/scembed/aggregation.py 22.22% 28 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #17      +/-   ##
==========================================
- Coverage   75.74%   74.28%   -1.46%     
==========================================
  Files          11       11              
  Lines        1303     1338      +35     
==========================================
+ Hits          987      994       +7     
- Misses        316      344      +28     
Files with missing lines Coverage Δ
src/scembed/aggregation.py 73.86% <22.22%> (-6.42%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants