Centralise result-variable missing-value representation by blooop · Pull Request #953 · blooop/bencher

blooop · 2026-06-03T22:00:05Z

~~Stacked on #952~~ — #952 has merged and main has been merged back into this branch; the PR now targets main directly with only the centralisation refactor.

What

Introduce a single source of truth for how a missing/unrecorded result entry is represented, and route the storage + consumer code through it.

Why

"Missing" was represented by three dtype-specific sentinels:

Result type	dtype	sentinel
`ResultFloat`/`Bool`/`Vec` (+ numeric)	`float`	`NaN`
`ResultReference`/`DataSet`	`int`	`-1`
`ResultPath`/`Video`/`Image`/`String`/`Container`/`Rerun`	`object`	`"NAN"`

…but that mapping was duplicated across setup_dataset (initial fill) and _sentinel_for_result_var (over_time aging), and consumers hardcoded the check per call site (== "NAN" for file panes, np.isnan/skipna for numerics). Adding a new result type meant editing several isinstance ladders.

Change

New helpers in bencher/variables/results.py:

result_missing_fill(rv) -> (fill_value, numpy_dtype)
result_is_missing(rv, value) -> bool
ResultCollector.setup_dataset and _sentinel_for_result_var now build from result_missing_fill — the two isinstance ladders collapse into one polymorphic call.
The hardcoded == "NAN" file checks in bench_result_base.py become result_is_missing(result_var, filepath).

Not a behaviour change

The stored fill values and dtypes are identical to before — so the on-disk cache format, the golden BenchCfg hash, and every reduction are unchanged. pixi run ci green (1445 passed, 5 skipped — same as base), format/lint/ty/pylint clean.

Follow-up enabled (not in this PR)

Because the object-type sentinel is now defined and checked in one place, switching it from the "NAN" string to a literal None becomes a one-line, consumer-invisible change (would need its own CACHE_VERSION bump since it changes the stored value).

🤖 Generated with Claude Code

Summary by Sourcery

Centralise the representation of missing result-variable entries and route dataset setup and consumers through shared helpers.

Enhancements:

Introduce shared helpers to define and detect missing values for result variables across all result types.
Refactor dataset initialisation and over-time aging to derive fill values and dtypes from the central missing-value helpers instead of per-call-site type ladders.
Update file-based result rendering to use the shared missing-value predicate rather than hardcoded string sentinels.

A 0 default made an unrecorded sample (a run that aborts before measuring, or a result var the worker never sets) indistinguishable from a real 0 measurement, dragging nan-aware regression/aggregation means toward zero. The storage layer already initialises result arrays with NaN, so 0 was the value that destroyed that "unwritten" sentinel. Flip ResultFloat/ResultVec default to NaN so unrecorded samples are treated as missing and dropped by the existing nan-aware reductions. ResultBool stays 0 (=False): False is a meaningful default for a binary outcome and the binomial-std calc treats bool means as proportions over a fixed repeat count. Callers wanting zero-fill opt out with default=0. Bump CACHE_VERSION 3->4 to flush stale 0-filled benchmark/over_time caches, update the golden BenchCfg hash fixtures accordingly, and bump the package version to 1.103.0.

"Missing"/unrecorded entries were represented by three dtype-specific sentinels (NaN for numeric, -1 for index-backed reference types, the string "NAN" for object/file types), but that type->sentinel mapping was duplicated across setup_dataset (initial array fill) and _sentinel_for_result_var (over_time aging), and consumers hardcoded the check per call site (== "NAN" for file panes, np.isnan/skipna for numerics). Add a single source of truth in bencher.variables.results: - result_missing_fill(rv) -> (fill_value, numpy_dtype) - result_is_missing(rv, value) -> bool Route setup_dataset and _sentinel_for_result_var through result_missing_fill (the two isinstance ladders collapse into one polymorphic call), and replace the hardcoded == "NAN" file checks in bench_result_base with result_is_missing. Pure refactor: the stored fill values and dtypes are unchanged, so the on-disk format, golden BenchCfg hash, and all reductions are identical. New result types now get a NaN default automatically and can declare a different missing representation in one place. This also makes a future switch of the object sentinel from "NAN" to a literal None a one-line, consumer-invisible change.

sourcery-ai · 2026-06-03T22:00:12Z

Reviewer's Guide

Centralizes the representation of missing/unrecorded result values via shared helpers, and refactors dataset setup, over_time aging, and file-based consumers to rely on this single source of truth without changing persisted behavior.

Sequence diagram for file-based pane missing-check using centralized helper

sequenceDiagram
    actor User
    participant BenchResultBase as BenchResultBase
    participant Dataset as xarray_Dataset
    participant VariablesResults as variables_results

    User ->> BenchResultBase: _pane_over_time_slider(result_var, dataset)
    loop over_time indices
        BenchResultBase ->> Dataset: isel(over_time=idx)
        Dataset -->> BenchResultBase: ds_t
        BenchResultBase ->> BenchResultBase: zero_dim_da_to_val(ds_t[result_var.name])
        BenchResultBase -->> BenchResultBase: filepath
        BenchResultBase ->> VariablesResults: result_is_missing(result_var, filepath)
        VariablesResults -->> BenchResultBase: bool
        alt [result_is_missing]
            BenchResultBase ->> BenchResultBase: append _NO_DATA_HTML
        else [not missing]
            BenchResultBase ->> BenchResultBase: os.path.isfile(filepath)
        end
    end

File-Level Changes

Change	Details	Files
Introduce centralized helpers that define and query the missing-value sentinel for result variables.	Add `_REFERENCE_MISSING_TYPES`, `_OBJECT_MISSING_TYPES`, and `DATA_VAR_RESULT_TYPES` groupings for result-variable categories Implement `result_missing_fill(rv)` to return the fill value and numpy dtype for missing entries based on result type Implement `result_is_missing(rv, value)` to check for missing/unrecorded entries using the centralized sentinel logic, including NaN-aware comparisons	`bencher/variables/results.py`
Refactor result dataset initialization and aging to use the centralized missing-value helpers instead of duplicated isinstance ladders.	Update `_sentinel_for_result_var` to delegate to `result_missing_fill(rv)[0]` instead of maintaining its own type-based sentinel mapping Change `ResultCollector.setup_dataset` to use `DATA_VAR_RESULT_TYPES` and `result_missing_fill` when creating data arrays, while preserving the special expansion behavior for `ResultVec` Ensure that stored fill values and dtypes (NaN, -1, "NAN") remain unchanged so cache format and behavior are preserved	`bencher/result_collector.py` `bencher/variables/results.py`
Update file-backed result consumers to use the centralized missing-value check instead of hardcoded string comparisons.	Replace `filepath == "NAN"` checks in over-time slider and grid pane rendering with `result_is_missing(result_var, filepath)` Keep existing file-existence guard (`os.path.isfile`) so visual behavior is unchanged for missing or invalid paths	`bencher/results/bench_result_base.py`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey - I've found 2 issues, and left some high level feedback:

In result_is_missing, treating None as missing for all NaN-backed result types may be a subtle behavior change compared to the previous sentinel-only checks; consider either constraining this to known call sites or documenting that None is now intentionally treated as missing for numerics.
The new DATA_VAR_RESULT_TYPES aggregate is a useful centralization; it may be worth adding a brief comment or assertion tying it to SCALAR_RESULT_TYPES to catch future result types that should get data variables but are accidentally omitted from this tuple.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- In `result_is_missing`, treating `None` as missing for all NaN-backed result types may be a subtle behavior change compared to the previous sentinel-only checks; consider either constraining this to known call sites or documenting that `None` is now intentionally treated as missing for numerics.
- The new `DATA_VAR_RESULT_TYPES` aggregate is a useful centralization; it may be worth adding a brief comment or assertion tying it to `SCALAR_RESULT_TYPES` to catch future result types that should get data variables but are accidentally omitted from this tuple.

## Individual Comments

### Comment 1
<location path="bencher/result_collector.py" line_range="220-228" />
<code_context>
-                data_vars[rv.name] = (dims_cfg.dims_name, result_data)
-
-            elif type(rv) is ResultVec:
+            if type(rv) is ResultVec:
+                # ResultVec expands to one column per vector element.
                 for i in range(rv.size):
                     result_data = np.full(dims_cfg.dims_size, np.nan)
                     data_vars[rv.index_name(i)] = (dims_cfg.dims_name, result_data)
+            elif isinstance(rv, DATA_VAR_RESULT_TYPES):
+                fill, dtype = result_missing_fill(rv)
</code_context>
<issue_to_address>
**suggestion:** Explicitly pass dtype into `np.full` for `ResultVec` for consistency with other result types.

For the `ResultVec` case, `np.full(dims_cfg.dims_size, np.nan)` relies on NumPy inferring `float` and is inconsistent with the new `(fill, dtype)` pattern elsewhere in `setup_dataset`. To keep behavior uniform and robust, use `fill, dtype = result_missing_fill(rv)` and then `np.full(..., fill, dtype=dtype)` for the vector elements as well.

```suggestion
            if type(rv) is ResultVec:
                # ResultVec expands to one column per vector element.
                fill, dtype = result_missing_fill(rv)
                for i in range(rv.size):
                    result_data = np.full(dims_cfg.dims_size, fill, dtype=dtype)
                    data_vars[rv.index_name(i)] = (dims_cfg.dims_name, result_data)
            elif isinstance(rv, DATA_VAR_RESULT_TYPES):
                fill, dtype = result_missing_fill(rv)
                result_data = np.full(dims_cfg.dims_size, fill, dtype=dtype)
                data_vars[rv.name] = (dims_cfg.dims_name, result_data)
```
</issue_to_address>

### Comment 2
<location path="bencher/results/bench_result_base.py" line_range="906-907" />
<code_context>
         for idx, _t in enumerate(time_vals):
             ds_t = dataset.isel(over_time=idx)
             filepath = str(self.zero_dim_da_to_val(ds_t[result_var.name]))
-            if filepath == "NAN" or not os.path.isfile(filepath):
+            if result_is_missing(result_var, filepath) or not os.path.isfile(filepath):
                 html_list.append(_NO_DATA_HTML)
                 continue
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Avoid coercing the dataset value to `str` before checking `result_is_missing`.

Coercing `filepath` to `str` before `result_is_missing` ties the missingness check to a specific string representation (currently "NAN"). If `zero_dim_da_to_val` later returns richer path-like objects or the sentinel changes, this could incorrectly classify values. Instead, pass the raw value into `result_is_missing(result_var, value_raw)` and only convert to `str` after confirming it’s not missing.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

Lock in ResultBool's intentionally-different 0/False default and guard against an accidental regression to NaN, per Sourcery review feedback.

…-value

- setup_dataset: pass (fill, dtype) from result_missing_fill for the ResultVec branch too, so all result types build their arrays uniformly. - bench_result_base: test result_is_missing on the raw dataset value before coercing to str, so missingness is not tied to the str() form of the sentinel (robust if the object sentinel later becomes None). - result_is_missing: document that None counts as missing for numeric types (intentional).

blooop · 2026-06-04T07:26:54Z

Addressed Sourcery review (commit 10e2dc8):

ResultVec dtype consistency ✅ — the ResultVec branch in setup_dataset now uses fill, dtype = result_missing_fill(rv) and np.full(..., fill, dtype=dtype), matching the single-column path.
Avoid str() before the missing check ✅ — both file-pane consumers now call result_is_missing(result_var, value) on the raw dataset value and only str() it after confirming it's not missing, so missingness isn't tied to the string form of the sentinel.
Document None-as-missing for numerics ✅ — expanded the result_is_missing docstring to state that NaN-backed types treat both NaN and None as missing (intentional), while -1/"NAN" types use exact equality.
Tie DATA_VAR_RESULT_TYPES to SCALAR_RESULT_TYPES — already built as SCALAR_RESULT_TYPES + _REFERENCE_MISSING_TYPES + _OBJECT_MISSING_TYPES with an explanatory comment, so a new scalar type added to SCALAR_RESULT_TYPES is picked up automatically.

CI green on the stacked branch: 1446 passed, 5 skipped.

Resolves the stale-stack conflicts from the earlier #952 iteration this branch was stacked on, in favour of what actually merged to main: - ResultBool default stays NaN (main's 1.105.0), including the _validate_bounds NaN carve-out — the branch-side default=0 comment and test_bool_default_stays_false_not_nan are dropped. - CACHE_VERSION stays at "4" (already bumped on main by #961); this refactor stores identical fill values so it needs no bump of its own. - test/test_result_nan_default.py and pyproject/CHANGELOG taken from main; the stale 1.103.0 changelog entry is dropped. - The golden BenchCfg hashes already matched main.

- result_is_missing no longer float-coerces non-numeric values for NaN-backed types: the *string* "nan" is real data, not the missing sentinel. Numeric detection now uses numbers.Real (covers python ints/floats/bools and numpy scalars). - New test/test_result_missing.py pins the (fill, dtype) mapping per result type, DATA_VAR_RESULT_TYPES membership, result_is_missing semantics per sentinel family, the result_collector sentinel wrapper, and a fill->typed-array->detect round trip, so a future result type that silently falls through to the NaN default fails loudly. - Version 1.107.0 + changelog entry for the refactor.

blooop · 2026-06-12T16:05:47Z

Applied review fixes and merged main (10e2dc8..1039917):

Merge commit 27bb101 resolves the stale-stack conflicts in favour of what actually merged to main: ResultBool default stays NaN (1.105.0 behaviour, including the _validate_bounds NaN carve-out), CACHE_VERSION stays at "4" (no bump needed — this refactor stores identical fill values), and the stale 1.103.0 changelog entry / test_bool_default_stays_false_not_nan are dropped.
1039917 tightens result_is_missing: non-numeric values are no longer float-coerced for NaN-backed types (the string "nan" is real data, not the missing sentinel); detection uses numbers.Real.
Adds test/test_result_missing.py pinning the (fill, dtype) mapping per result type, DATA_VAR_RESULT_TYPES membership, result_is_missing semantics per sentinel family, the result_collector sentinel wrapper, and a fill → typed-array → detect round trip.
Version 1.107.0 + changelog entry for the refactor.

Local pixi run ci: 1490 passed, 5 skipped, format/lint/pylint clean. One unrelated failure (test_cartesian_pil_renderer.py::test_timeline golden image hash) — the files it depends on are byte-identical to main, so it's a devcontainer font-rendering difference, not a regression.

github-actions · 2026-06-12T16:08:02Z

Performance Report for `1039917`

Metric	Value
Total tests	1671
Total time	126.33s
Mean	0.0756s
Median	0.0020s

Top 10 slowest tests

Test	Time (s)
`test.test_bench_examples.TestBenchExamples::test_example_meta`	17.228
`test.test_over_time_save_perf::test_save_faster_without_aggregated_tab`	5.215
`test.test_generated_examples::test_generated_example[regression/example_regression_tuning_drift.py]`	4.673
`test.test_split_render_examples::test_split_render_subprocess_media`	3.172
`test.test_generated_examples::test_generated_example[cartesian_animation/example_cartesian_animation.py]`	2.971
`test.test_generated_examples::test_generated_example[result_types/result_image/example_result_image_to_video.py]`	2.891
`test.test_hash_persistent.TestCrossProcessDeterminism::test_hash_stable_across_two_processes[ResultBool]`	2.890
`test.test_generated_examples::test_generated_example[regression/example_regression_tuning_noise.py]`	2.835
`test.test_generated_examples::test_generated_example[regression/example_regression_tuning_step.py]`	2.581
`test.test_over_time_repeats.TestMaxSliderPoints::test_default_subsampling_caps_at_max`	2.445

Full report

Updated by Performance Tracking workflow

blooop · 2026-06-13T08:06:14Z

            ds_t = dataset.isel(over_time=idx)
-            filepath = str(self.zero_dim_da_to_val(ds_t[result_var.name]))
-            if filepath == "NAN" or not os.path.isfile(filepath):
+            value = self.zero_dim_da_to_val(ds_t[result_var.name])


this block seems duplicated above

Good catch — extracted the duplicated filepath-resolution block (zero_dim_da_to_val → result_is_missing → os.path.isfile) into a single _over_time_filepath(dataset, result_var, idx) helper, now used by both _pane_over_time_slider and _pane_over_time_grid. Pushed in 2ea75d7.

…logic Both _pane_over_time_slider and _pane_over_time_grid resolved the per-time-point filepath with the same missing-check + os.path.isfile guard. Extract _over_time_filepath() so the logic lives in one place, addressing the duplicated-block review comment.

github-actions · 2026-06-13T08:15:19Z

Performance Report for `2ea75d7`

Metric	Value
Total tests	1680
Total time	127.29s
Mean	0.0758s
Median	0.0020s

Top 10 slowest tests

Test	Time (s)
`test.test_bench_examples.TestBenchExamples::test_example_meta`	18.201
`test.test_over_time_save_perf::test_save_faster_without_aggregated_tab`	5.122
`test.test_generated_examples::test_generated_example[regression/example_regression_tuning_drift.py]`	4.613
`test.test_split_render_examples::test_split_render_subprocess_media`	3.033
`test.test_generated_examples::test_generated_example[cartesian_animation/example_cartesian_animation.py]`	2.992
`test.test_hash_persistent.TestCrossProcessDeterminism::test_hash_stable_across_two_processes[ResultBool]`	2.862
`test.test_generated_examples::test_generated_example[result_types/result_image/example_result_image_to_video.py]`	2.850
`test.test_generated_examples::test_generated_example[regression/example_regression_tuning_noise.py]`	2.792
`test.test_generated_examples::test_generated_example[regression/example_regression_tuning_step.py]`	2.536
`test.test_over_time_repeats.TestMaxSliderPoints::test_default_subsampling_caps_at_max`	2.424

Full report

Updated by Performance Tracking workflow

blooop added 2 commits June 3, 2026 22:42

sourcery-ai Bot reviewed Jun 3, 2026

View reviewed changes

Comment thread bencher/result_collector.py

Comment thread bencher/results/bench_result_base.py Outdated

blooop added 3 commits June 4, 2026 08:18

Add ResultBool default-stays-false test (review)

91eae7d

Lock in ResultBool's intentionally-different 0/False default and guard against an accidental regression to NaN, per Sourcery review feedback.

Merge branch 'flip-result-default-to-nan' into generic-result-missing…

d9c0386

…-value

blooop force-pushed the flip-result-default-to-nan branch 2 times, most recently from 85af908 to 992f181 Compare June 11, 2026 11:11

Base automatically changed from flip-result-default-to-nan to main June 11, 2026 15:06

blooop added 2 commits June 12, 2026 16:04

blooop commented Jun 13, 2026

View reviewed changes

blooop merged commit 29e3cae into main Jun 13, 2026
8 checks passed

blooop deleted the generic-result-missing-value branch June 13, 2026 12:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Centralise result-variable missing-value representation#953

Centralise result-variable missing-value representation#953
blooop merged 8 commits into
mainfrom
generic-result-missing-value

blooop commented Jun 3, 2026 •

edited

Loading

Uh oh!

sourcery-ai Bot commented Jun 3, 2026 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

blooop commented Jun 4, 2026

Uh oh!

blooop commented Jun 12, 2026

Uh oh!

github-actions Bot commented Jun 12, 2026

Uh oh!

blooop Jun 13, 2026

Uh oh!

blooop Jun 13, 2026

Uh oh!

github-actions Bot commented Jun 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

blooop commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Why

Change

Not a behaviour change

Follow-up enabled (not in this PR)

Summary by Sourcery

Uh oh!

sourcery-ai Bot commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Sequence diagram for file-based pane missing-check using centralized helper

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

blooop commented Jun 4, 2026

Uh oh!

blooop commented Jun 12, 2026

Uh oh!

github-actions Bot commented Jun 12, 2026

Performance Report for 1039917

Uh oh!

blooop Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

blooop Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jun 13, 2026

Performance Report for 2ea75d7

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

blooop commented Jun 3, 2026 •

edited

Loading

sourcery-ai Bot commented Jun 3, 2026 •

edited

Loading

Performance Report for `1039917`

Performance Report for `2ea75d7`