Skip to content

Centralise result-variable missing-value representation#953

Merged
blooop merged 8 commits into
mainfrom
generic-result-missing-value
Jun 13, 2026
Merged

Centralise result-variable missing-value representation#953
blooop merged 8 commits into
mainfrom
generic-result-missing-value

Conversation

@blooop

@blooop blooop commented Jun 3, 2026

Copy link
Copy Markdown
Owner

Stacked on #952#952 has merged and main has been merged back into this branch; the PR now targets main directly with only the centralisation refactor.

What

Introduce a single source of truth for how a missing/unrecorded result entry is represented, and route the storage + consumer code through it.

Why

"Missing" was represented by three dtype-specific sentinels:

Result type dtype sentinel
ResultFloat/Bool/Vec (+ numeric) float NaN
ResultReference/DataSet int -1
ResultPath/Video/Image/String/Container/Rerun object "NAN"

…but that mapping was duplicated across setup_dataset (initial fill) and _sentinel_for_result_var (over_time aging), and consumers hardcoded the check per call site (== "NAN" for file panes, np.isnan/skipna for numerics). Adding a new result type meant editing several isinstance ladders.

Change

New helpers in bencher/variables/results.py:

  • result_missing_fill(rv) -> (fill_value, numpy_dtype)

  • result_is_missing(rv, value) -> bool

  • ResultCollector.setup_dataset and _sentinel_for_result_var now build from result_missing_fill — the two isinstance ladders collapse into one polymorphic call.

  • The hardcoded == "NAN" file checks in bench_result_base.py become result_is_missing(result_var, filepath).

Not a behaviour change

The stored fill values and dtypes are identical to before — so the on-disk cache format, the golden BenchCfg hash, and every reduction are unchanged. pixi run ci green (1445 passed, 5 skipped — same as base), format/lint/ty/pylint clean.

Follow-up enabled (not in this PR)

Because the object-type sentinel is now defined and checked in one place, switching it from the "NAN" string to a literal None becomes a one-line, consumer-invisible change (would need its own CACHE_VERSION bump since it changes the stored value).

🤖 Generated with Claude Code

Summary by Sourcery

Centralise the representation of missing result-variable entries and route dataset setup and consumers through shared helpers.

Enhancements:

  • Introduce shared helpers to define and detect missing values for result variables across all result types.
  • Refactor dataset initialisation and over-time aging to derive fill values and dtypes from the central missing-value helpers instead of per-call-site type ladders.
  • Update file-based result rendering to use the shared missing-value predicate rather than hardcoded string sentinels.

blooop added 2 commits June 3, 2026 22:42
A 0 default made an unrecorded sample (a run that aborts before
measuring, or a result var the worker never sets) indistinguishable
from a real 0 measurement, dragging nan-aware regression/aggregation
means toward zero. The storage layer already initialises result arrays
with NaN, so 0 was the value that destroyed that "unwritten" sentinel.

Flip ResultFloat/ResultVec default to NaN so unrecorded samples are
treated as missing and dropped by the existing nan-aware reductions.
ResultBool stays 0 (=False): False is a meaningful default for a binary
outcome and the binomial-std calc treats bool means as proportions over
a fixed repeat count. Callers wanting zero-fill opt out with default=0.

Bump CACHE_VERSION 3->4 to flush stale 0-filled benchmark/over_time
caches, update the golden BenchCfg hash fixtures accordingly, and bump
the package version to 1.103.0.
"Missing"/unrecorded entries were represented by three dtype-specific
sentinels (NaN for numeric, -1 for index-backed reference types, the
string "NAN" for object/file types), but that type->sentinel mapping was
duplicated across setup_dataset (initial array fill) and
_sentinel_for_result_var (over_time aging), and consumers hardcoded the
check per call site (== "NAN" for file panes, np.isnan/skipna for
numerics).

Add a single source of truth in bencher.variables.results:
- result_missing_fill(rv) -> (fill_value, numpy_dtype)
- result_is_missing(rv, value) -> bool

Route setup_dataset and _sentinel_for_result_var through
result_missing_fill (the two isinstance ladders collapse into one
polymorphic call), and replace the hardcoded == "NAN" file checks in
bench_result_base with result_is_missing.

Pure refactor: the stored fill values and dtypes are unchanged, so the
on-disk format, golden BenchCfg hash, and all reductions are identical.
New result types now get a NaN default automatically and can declare a
different missing representation in one place. This also makes a future
switch of the object sentinel from "NAN" to a literal None a one-line,
consumer-invisible change.
@sourcery-ai

sourcery-ai Bot commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Reviewer's Guide

Centralizes the representation of missing/unrecorded result values via shared helpers, and refactors dataset setup, over_time aging, and file-based consumers to rely on this single source of truth without changing persisted behavior.

Sequence diagram for file-based pane missing-check using centralized helper

sequenceDiagram
    actor User
    participant BenchResultBase as BenchResultBase
    participant Dataset as xarray_Dataset
    participant VariablesResults as variables_results

    User ->> BenchResultBase: _pane_over_time_slider(result_var, dataset)
    loop over_time indices
        BenchResultBase ->> Dataset: isel(over_time=idx)
        Dataset -->> BenchResultBase: ds_t
        BenchResultBase ->> BenchResultBase: zero_dim_da_to_val(ds_t[result_var.name])
        BenchResultBase -->> BenchResultBase: filepath
        BenchResultBase ->> VariablesResults: result_is_missing(result_var, filepath)
        VariablesResults -->> BenchResultBase: bool
        alt [result_is_missing]
            BenchResultBase ->> BenchResultBase: append _NO_DATA_HTML
        else [not missing]
            BenchResultBase ->> BenchResultBase: os.path.isfile(filepath)
        end
    end
Loading

File-Level Changes

Change Details Files
Introduce centralized helpers that define and query the missing-value sentinel for result variables.
  • Add _REFERENCE_MISSING_TYPES, _OBJECT_MISSING_TYPES, and DATA_VAR_RESULT_TYPES groupings for result-variable categories
  • Implement result_missing_fill(rv) to return the fill value and numpy dtype for missing entries based on result type
  • Implement result_is_missing(rv, value) to check for missing/unrecorded entries using the centralized sentinel logic, including NaN-aware comparisons
bencher/variables/results.py
Refactor result dataset initialization and aging to use the centralized missing-value helpers instead of duplicated isinstance ladders.
  • Update _sentinel_for_result_var to delegate to result_missing_fill(rv)[0] instead of maintaining its own type-based sentinel mapping
  • Change ResultCollector.setup_dataset to use DATA_VAR_RESULT_TYPES and result_missing_fill when creating data arrays, while preserving the special expansion behavior for ResultVec
  • Ensure that stored fill values and dtypes (NaN, -1, "NAN") remain unchanged so cache format and behavior are preserved
bencher/result_collector.py
bencher/variables/results.py
Update file-backed result consumers to use the centralized missing-value check instead of hardcoded string comparisons.
  • Replace filepath == "NAN" checks in over-time slider and grid pane rendering with result_is_missing(result_var, filepath)
  • Keep existing file-existence guard (os.path.isfile) so visual behavior is unchanged for missing or invalid paths
bencher/results/bench_result_base.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@sourcery-ai sourcery-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 2 issues, and left some high level feedback:

  • In result_is_missing, treating None as missing for all NaN-backed result types may be a subtle behavior change compared to the previous sentinel-only checks; consider either constraining this to known call sites or documenting that None is now intentionally treated as missing for numerics.
  • The new DATA_VAR_RESULT_TYPES aggregate is a useful centralization; it may be worth adding a brief comment or assertion tying it to SCALAR_RESULT_TYPES to catch future result types that should get data variables but are accidentally omitted from this tuple.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `result_is_missing`, treating `None` as missing for all NaN-backed result types may be a subtle behavior change compared to the previous sentinel-only checks; consider either constraining this to known call sites or documenting that `None` is now intentionally treated as missing for numerics.
- The new `DATA_VAR_RESULT_TYPES` aggregate is a useful centralization; it may be worth adding a brief comment or assertion tying it to `SCALAR_RESULT_TYPES` to catch future result types that should get data variables but are accidentally omitted from this tuple.

## Individual Comments

### Comment 1
<location path="bencher/result_collector.py" line_range="220-228" />
<code_context>
-                data_vars[rv.name] = (dims_cfg.dims_name, result_data)
-
-            elif type(rv) is ResultVec:
+            if type(rv) is ResultVec:
+                # ResultVec expands to one column per vector element.
                 for i in range(rv.size):
                     result_data = np.full(dims_cfg.dims_size, np.nan)
                     data_vars[rv.index_name(i)] = (dims_cfg.dims_name, result_data)
+            elif isinstance(rv, DATA_VAR_RESULT_TYPES):
+                fill, dtype = result_missing_fill(rv)
</code_context>
<issue_to_address>
**suggestion:** Explicitly pass dtype into `np.full` for `ResultVec` for consistency with other result types.

For the `ResultVec` case, `np.full(dims_cfg.dims_size, np.nan)` relies on NumPy inferring `float` and is inconsistent with the new `(fill, dtype)` pattern elsewhere in `setup_dataset`. To keep behavior uniform and robust, use `fill, dtype = result_missing_fill(rv)` and then `np.full(..., fill, dtype=dtype)` for the vector elements as well.

```suggestion
            if type(rv) is ResultVec:
                # ResultVec expands to one column per vector element.
                fill, dtype = result_missing_fill(rv)
                for i in range(rv.size):
                    result_data = np.full(dims_cfg.dims_size, fill, dtype=dtype)
                    data_vars[rv.index_name(i)] = (dims_cfg.dims_name, result_data)
            elif isinstance(rv, DATA_VAR_RESULT_TYPES):
                fill, dtype = result_missing_fill(rv)
                result_data = np.full(dims_cfg.dims_size, fill, dtype=dtype)
                data_vars[rv.name] = (dims_cfg.dims_name, result_data)
```
</issue_to_address>

### Comment 2
<location path="bencher/results/bench_result_base.py" line_range="906-907" />
<code_context>
         for idx, _t in enumerate(time_vals):
             ds_t = dataset.isel(over_time=idx)
             filepath = str(self.zero_dim_da_to_val(ds_t[result_var.name]))
-            if filepath == "NAN" or not os.path.isfile(filepath):
+            if result_is_missing(result_var, filepath) or not os.path.isfile(filepath):
                 html_list.append(_NO_DATA_HTML)
                 continue
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Avoid coercing the dataset value to `str` before checking `result_is_missing`.

Coercing `filepath` to `str` before `result_is_missing` ties the missingness check to a specific string representation (currently "NAN"). If `zero_dim_da_to_val` later returns richer path-like objects or the sentinel changes, this could incorrectly classify values. Instead, pass the raw value into `result_is_missing(result_var, value_raw)` and only convert to `str` after confirming it’s not missing.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread bencher/result_collector.py
Comment thread bencher/results/bench_result_base.py Outdated
blooop added 3 commits June 4, 2026 08:18
Lock in ResultBool's intentionally-different 0/False default and guard
against an accidental regression to NaN, per Sourcery review feedback.
- setup_dataset: pass (fill, dtype) from result_missing_fill for the
  ResultVec branch too, so all result types build their arrays uniformly.
- bench_result_base: test result_is_missing on the raw dataset value
  before coercing to str, so missingness is not tied to the str() form of
  the sentinel (robust if the object sentinel later becomes None).
- result_is_missing: document that None counts as missing for numeric
  types (intentional).
@blooop

blooop commented Jun 4, 2026

Copy link
Copy Markdown
Owner Author

Addressed Sourcery review (commit 10e2dc8):

  • ResultVec dtype consistency ✅ — the ResultVec branch in setup_dataset now uses fill, dtype = result_missing_fill(rv) and np.full(..., fill, dtype=dtype), matching the single-column path.
  • Avoid str() before the missing check ✅ — both file-pane consumers now call result_is_missing(result_var, value) on the raw dataset value and only str() it after confirming it's not missing, so missingness isn't tied to the string form of the sentinel.
  • Document None-as-missing for numerics ✅ — expanded the result_is_missing docstring to state that NaN-backed types treat both NaN and None as missing (intentional), while -1/"NAN" types use exact equality.
  • Tie DATA_VAR_RESULT_TYPES to SCALAR_RESULT_TYPES — already built as SCALAR_RESULT_TYPES + _REFERENCE_MISSING_TYPES + _OBJECT_MISSING_TYPES with an explanatory comment, so a new scalar type added to SCALAR_RESULT_TYPES is picked up automatically.

CI green on the stacked branch: 1446 passed, 5 skipped.

@blooop blooop force-pushed the flip-result-default-to-nan branch 2 times, most recently from 85af908 to 992f181 Compare June 11, 2026 11:11
Base automatically changed from flip-result-default-to-nan to main June 11, 2026 15:06
blooop added 2 commits June 12, 2026 16:04
Resolves the stale-stack conflicts from the earlier #952 iteration this
branch was stacked on, in favour of what actually merged to main:

- ResultBool default stays NaN (main's 1.105.0), including the
  _validate_bounds NaN carve-out — the branch-side default=0 comment and
  test_bool_default_stays_false_not_nan are dropped.
- CACHE_VERSION stays at "4" (already bumped on main by #961); this
  refactor stores identical fill values so it needs no bump of its own.
- test/test_result_nan_default.py and pyproject/CHANGELOG taken from
  main; the stale 1.103.0 changelog entry is dropped.
- The golden BenchCfg hashes already matched main.
- result_is_missing no longer float-coerces non-numeric values for
  NaN-backed types: the *string* "nan" is real data, not the missing
  sentinel. Numeric detection now uses numbers.Real (covers python
  ints/floats/bools and numpy scalars).
- New test/test_result_missing.py pins the (fill, dtype) mapping per
  result type, DATA_VAR_RESULT_TYPES membership, result_is_missing
  semantics per sentinel family, the result_collector sentinel wrapper,
  and a fill->typed-array->detect round trip, so a future result type
  that silently falls through to the NaN default fails loudly.
- Version 1.107.0 + changelog entry for the refactor.
@blooop

blooop commented Jun 12, 2026

Copy link
Copy Markdown
Owner Author

Applied review fixes and merged main (10e2dc8..1039917):

  • Merge commit 27bb101 resolves the stale-stack conflicts in favour of what actually merged to main: ResultBool default stays NaN (1.105.0 behaviour, including the _validate_bounds NaN carve-out), CACHE_VERSION stays at "4" (no bump needed — this refactor stores identical fill values), and the stale 1.103.0 changelog entry / test_bool_default_stays_false_not_nan are dropped.
  • 1039917 tightens result_is_missing: non-numeric values are no longer float-coerced for NaN-backed types (the string "nan" is real data, not the missing sentinel); detection uses numbers.Real.
  • Adds test/test_result_missing.py pinning the (fill, dtype) mapping per result type, DATA_VAR_RESULT_TYPES membership, result_is_missing semantics per sentinel family, the result_collector sentinel wrapper, and a fill → typed-array → detect round trip.
  • Version 1.107.0 + changelog entry for the refactor.

Local pixi run ci: 1490 passed, 5 skipped, format/lint/pylint clean. One unrelated failure (test_cartesian_pil_renderer.py::test_timeline golden image hash) — the files it depends on are byte-identical to main, so it's a devcontainer font-rendering difference, not a regression.

@github-actions

Copy link
Copy Markdown

Performance Report for 1039917

Metric Value
Total tests 1671
Total time 126.33s
Mean 0.0756s
Median 0.0020s
Top 10 slowest tests
Test Time (s)
test.test_bench_examples.TestBenchExamples::test_example_meta 17.228
test.test_over_time_save_perf::test_save_faster_without_aggregated_tab 5.215
test.test_generated_examples::test_generated_example[regression/example_regression_tuning_drift.py] 4.673
test.test_split_render_examples::test_split_render_subprocess_media 3.172
test.test_generated_examples::test_generated_example[cartesian_animation/example_cartesian_animation.py] 2.971
test.test_generated_examples::test_generated_example[result_types/result_image/example_result_image_to_video.py] 2.891
test.test_hash_persistent.TestCrossProcessDeterminism::test_hash_stable_across_two_processes[ResultBool] 2.890
test.test_generated_examples::test_generated_example[regression/example_regression_tuning_noise.py] 2.835
test.test_generated_examples::test_generated_example[regression/example_regression_tuning_step.py] 2.581
test.test_over_time_repeats.TestMaxSliderPoints::test_default_subsampling_caps_at_max 2.445

Full report

Updated by Performance Tracking workflow

Comment thread bencher/results/bench_result_base.py Outdated
ds_t = dataset.isel(over_time=idx)
filepath = str(self.zero_dim_da_to_val(ds_t[result_var.name]))
if filepath == "NAN" or not os.path.isfile(filepath):
value = self.zero_dim_da_to_val(ds_t[result_var.name])

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this block seems duplicated above

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — extracted the duplicated filepath-resolution block (zero_dim_da_to_val → result_is_missing → os.path.isfile) into a single _over_time_filepath(dataset, result_var, idx) helper, now used by both _pane_over_time_slider and _pane_over_time_grid. Pushed in 2ea75d7.

…logic

Both _pane_over_time_slider and _pane_over_time_grid resolved the
per-time-point filepath with the same missing-check + os.path.isfile
guard. Extract _over_time_filepath() so the logic lives in one place,
addressing the duplicated-block review comment.
@github-actions

Copy link
Copy Markdown

Performance Report for 2ea75d7

Metric Value
Total tests 1680
Total time 127.29s
Mean 0.0758s
Median 0.0020s
Top 10 slowest tests
Test Time (s)
test.test_bench_examples.TestBenchExamples::test_example_meta 18.201
test.test_over_time_save_perf::test_save_faster_without_aggregated_tab 5.122
test.test_generated_examples::test_generated_example[regression/example_regression_tuning_drift.py] 4.613
test.test_split_render_examples::test_split_render_subprocess_media 3.033
test.test_generated_examples::test_generated_example[cartesian_animation/example_cartesian_animation.py] 2.992
test.test_hash_persistent.TestCrossProcessDeterminism::test_hash_stable_across_two_processes[ResultBool] 2.862
test.test_generated_examples::test_generated_example[result_types/result_image/example_result_image_to_video.py] 2.850
test.test_generated_examples::test_generated_example[regression/example_regression_tuning_noise.py] 2.792
test.test_generated_examples::test_generated_example[regression/example_regression_tuning_step.py] 2.536
test.test_over_time_repeats.TestMaxSliderPoints::test_default_subsampling_caps_at_max 2.424

Full report

Updated by Performance Tracking workflow

@blooop blooop merged commit 29e3cae into main Jun 13, 2026
8 checks passed
@blooop blooop deleted the generic-result-missing-value branch June 13, 2026 12:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant