Centralise result-variable missing-value representation#953
Conversation
A 0 default made an unrecorded sample (a run that aborts before measuring, or a result var the worker never sets) indistinguishable from a real 0 measurement, dragging nan-aware regression/aggregation means toward zero. The storage layer already initialises result arrays with NaN, so 0 was the value that destroyed that "unwritten" sentinel. Flip ResultFloat/ResultVec default to NaN so unrecorded samples are treated as missing and dropped by the existing nan-aware reductions. ResultBool stays 0 (=False): False is a meaningful default for a binary outcome and the binomial-std calc treats bool means as proportions over a fixed repeat count. Callers wanting zero-fill opt out with default=0. Bump CACHE_VERSION 3->4 to flush stale 0-filled benchmark/over_time caches, update the golden BenchCfg hash fixtures accordingly, and bump the package version to 1.103.0.
"Missing"/unrecorded entries were represented by three dtype-specific sentinels (NaN for numeric, -1 for index-backed reference types, the string "NAN" for object/file types), but that type->sentinel mapping was duplicated across setup_dataset (initial array fill) and _sentinel_for_result_var (over_time aging), and consumers hardcoded the check per call site (== "NAN" for file panes, np.isnan/skipna for numerics). Add a single source of truth in bencher.variables.results: - result_missing_fill(rv) -> (fill_value, numpy_dtype) - result_is_missing(rv, value) -> bool Route setup_dataset and _sentinel_for_result_var through result_missing_fill (the two isinstance ladders collapse into one polymorphic call), and replace the hardcoded == "NAN" file checks in bench_result_base with result_is_missing. Pure refactor: the stored fill values and dtypes are unchanged, so the on-disk format, golden BenchCfg hash, and all reductions are identical. New result types now get a NaN default automatically and can declare a different missing representation in one place. This also makes a future switch of the object sentinel from "NAN" to a literal None a one-line, consumer-invisible change.
Reviewer's GuideCentralizes the representation of missing/unrecorded result values via shared helpers, and refactors dataset setup, over_time aging, and file-based consumers to rely on this single source of truth without changing persisted behavior. Sequence diagram for file-based pane missing-check using centralized helpersequenceDiagram
actor User
participant BenchResultBase as BenchResultBase
participant Dataset as xarray_Dataset
participant VariablesResults as variables_results
User ->> BenchResultBase: _pane_over_time_slider(result_var, dataset)
loop over_time indices
BenchResultBase ->> Dataset: isel(over_time=idx)
Dataset -->> BenchResultBase: ds_t
BenchResultBase ->> BenchResultBase: zero_dim_da_to_val(ds_t[result_var.name])
BenchResultBase -->> BenchResultBase: filepath
BenchResultBase ->> VariablesResults: result_is_missing(result_var, filepath)
VariablesResults -->> BenchResultBase: bool
alt [result_is_missing]
BenchResultBase ->> BenchResultBase: append _NO_DATA_HTML
else [not missing]
BenchResultBase ->> BenchResultBase: os.path.isfile(filepath)
end
end
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 2 issues, and left some high level feedback:
- In
result_is_missing, treatingNoneas missing for all NaN-backed result types may be a subtle behavior change compared to the previous sentinel-only checks; consider either constraining this to known call sites or documenting thatNoneis now intentionally treated as missing for numerics. - The new
DATA_VAR_RESULT_TYPESaggregate is a useful centralization; it may be worth adding a brief comment or assertion tying it toSCALAR_RESULT_TYPESto catch future result types that should get data variables but are accidentally omitted from this tuple.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `result_is_missing`, treating `None` as missing for all NaN-backed result types may be a subtle behavior change compared to the previous sentinel-only checks; consider either constraining this to known call sites or documenting that `None` is now intentionally treated as missing for numerics.
- The new `DATA_VAR_RESULT_TYPES` aggregate is a useful centralization; it may be worth adding a brief comment or assertion tying it to `SCALAR_RESULT_TYPES` to catch future result types that should get data variables but are accidentally omitted from this tuple.
## Individual Comments
### Comment 1
<location path="bencher/result_collector.py" line_range="220-228" />
<code_context>
- data_vars[rv.name] = (dims_cfg.dims_name, result_data)
-
- elif type(rv) is ResultVec:
+ if type(rv) is ResultVec:
+ # ResultVec expands to one column per vector element.
for i in range(rv.size):
result_data = np.full(dims_cfg.dims_size, np.nan)
data_vars[rv.index_name(i)] = (dims_cfg.dims_name, result_data)
+ elif isinstance(rv, DATA_VAR_RESULT_TYPES):
+ fill, dtype = result_missing_fill(rv)
</code_context>
<issue_to_address>
**suggestion:** Explicitly pass dtype into `np.full` for `ResultVec` for consistency with other result types.
For the `ResultVec` case, `np.full(dims_cfg.dims_size, np.nan)` relies on NumPy inferring `float` and is inconsistent with the new `(fill, dtype)` pattern elsewhere in `setup_dataset`. To keep behavior uniform and robust, use `fill, dtype = result_missing_fill(rv)` and then `np.full(..., fill, dtype=dtype)` for the vector elements as well.
```suggestion
if type(rv) is ResultVec:
# ResultVec expands to one column per vector element.
fill, dtype = result_missing_fill(rv)
for i in range(rv.size):
result_data = np.full(dims_cfg.dims_size, fill, dtype=dtype)
data_vars[rv.index_name(i)] = (dims_cfg.dims_name, result_data)
elif isinstance(rv, DATA_VAR_RESULT_TYPES):
fill, dtype = result_missing_fill(rv)
result_data = np.full(dims_cfg.dims_size, fill, dtype=dtype)
data_vars[rv.name] = (dims_cfg.dims_name, result_data)
```
</issue_to_address>
### Comment 2
<location path="bencher/results/bench_result_base.py" line_range="906-907" />
<code_context>
for idx, _t in enumerate(time_vals):
ds_t = dataset.isel(over_time=idx)
filepath = str(self.zero_dim_da_to_val(ds_t[result_var.name]))
- if filepath == "NAN" or not os.path.isfile(filepath):
+ if result_is_missing(result_var, filepath) or not os.path.isfile(filepath):
html_list.append(_NO_DATA_HTML)
continue
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Avoid coercing the dataset value to `str` before checking `result_is_missing`.
Coercing `filepath` to `str` before `result_is_missing` ties the missingness check to a specific string representation (currently "NAN"). If `zero_dim_da_to_val` later returns richer path-like objects or the sentinel changes, this could incorrectly classify values. Instead, pass the raw value into `result_is_missing(result_var, value_raw)` and only convert to `str` after confirming it’s not missing.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
Lock in ResultBool's intentionally-different 0/False default and guard against an accidental regression to NaN, per Sourcery review feedback.
- setup_dataset: pass (fill, dtype) from result_missing_fill for the ResultVec branch too, so all result types build their arrays uniformly. - bench_result_base: test result_is_missing on the raw dataset value before coercing to str, so missingness is not tied to the str() form of the sentinel (robust if the object sentinel later becomes None). - result_is_missing: document that None counts as missing for numeric types (intentional).
|
Addressed Sourcery review (commit 10e2dc8):
CI green on the stacked branch: 1446 passed, 5 skipped. |
85af908 to
992f181
Compare
Resolves the stale-stack conflicts from the earlier #952 iteration this branch was stacked on, in favour of what actually merged to main: - ResultBool default stays NaN (main's 1.105.0), including the _validate_bounds NaN carve-out — the branch-side default=0 comment and test_bool_default_stays_false_not_nan are dropped. - CACHE_VERSION stays at "4" (already bumped on main by #961); this refactor stores identical fill values so it needs no bump of its own. - test/test_result_nan_default.py and pyproject/CHANGELOG taken from main; the stale 1.103.0 changelog entry is dropped. - The golden BenchCfg hashes already matched main.
- result_is_missing no longer float-coerces non-numeric values for NaN-backed types: the *string* "nan" is real data, not the missing sentinel. Numeric detection now uses numbers.Real (covers python ints/floats/bools and numpy scalars). - New test/test_result_missing.py pins the (fill, dtype) mapping per result type, DATA_VAR_RESULT_TYPES membership, result_is_missing semantics per sentinel family, the result_collector sentinel wrapper, and a fill->typed-array->detect round trip, so a future result type that silently falls through to the NaN default fails loudly. - Version 1.107.0 + changelog entry for the refactor.
|
Applied review fixes and merged
Local |
Performance Report for
|
| Metric | Value |
|---|---|
| Total tests | 1671 |
| Total time | 126.33s |
| Mean | 0.0756s |
| Median | 0.0020s |
Top 10 slowest tests
| Test | Time (s) |
|---|---|
test.test_bench_examples.TestBenchExamples::test_example_meta |
17.228 |
test.test_over_time_save_perf::test_save_faster_without_aggregated_tab |
5.215 |
test.test_generated_examples::test_generated_example[regression/example_regression_tuning_drift.py] |
4.673 |
test.test_split_render_examples::test_split_render_subprocess_media |
3.172 |
test.test_generated_examples::test_generated_example[cartesian_animation/example_cartesian_animation.py] |
2.971 |
test.test_generated_examples::test_generated_example[result_types/result_image/example_result_image_to_video.py] |
2.891 |
test.test_hash_persistent.TestCrossProcessDeterminism::test_hash_stable_across_two_processes[ResultBool] |
2.890 |
test.test_generated_examples::test_generated_example[regression/example_regression_tuning_noise.py] |
2.835 |
test.test_generated_examples::test_generated_example[regression/example_regression_tuning_step.py] |
2.581 |
test.test_over_time_repeats.TestMaxSliderPoints::test_default_subsampling_caps_at_max |
2.445 |
Updated by Performance Tracking workflow
| ds_t = dataset.isel(over_time=idx) | ||
| filepath = str(self.zero_dim_da_to_val(ds_t[result_var.name])) | ||
| if filepath == "NAN" or not os.path.isfile(filepath): | ||
| value = self.zero_dim_da_to_val(ds_t[result_var.name]) |
There was a problem hiding this comment.
this block seems duplicated above
There was a problem hiding this comment.
Good catch — extracted the duplicated filepath-resolution block (zero_dim_da_to_val → result_is_missing → os.path.isfile) into a single _over_time_filepath(dataset, result_var, idx) helper, now used by both _pane_over_time_slider and _pane_over_time_grid. Pushed in 2ea75d7.
…logic Both _pane_over_time_slider and _pane_over_time_grid resolved the per-time-point filepath with the same missing-check + os.path.isfile guard. Extract _over_time_filepath() so the logic lives in one place, addressing the duplicated-block review comment.
Performance Report for
|
| Metric | Value |
|---|---|
| Total tests | 1680 |
| Total time | 127.29s |
| Mean | 0.0758s |
| Median | 0.0020s |
Top 10 slowest tests
| Test | Time (s) |
|---|---|
test.test_bench_examples.TestBenchExamples::test_example_meta |
18.201 |
test.test_over_time_save_perf::test_save_faster_without_aggregated_tab |
5.122 |
test.test_generated_examples::test_generated_example[regression/example_regression_tuning_drift.py] |
4.613 |
test.test_split_render_examples::test_split_render_subprocess_media |
3.033 |
test.test_generated_examples::test_generated_example[cartesian_animation/example_cartesian_animation.py] |
2.992 |
test.test_hash_persistent.TestCrossProcessDeterminism::test_hash_stable_across_two_processes[ResultBool] |
2.862 |
test.test_generated_examples::test_generated_example[result_types/result_image/example_result_image_to_video.py] |
2.850 |
test.test_generated_examples::test_generated_example[regression/example_regression_tuning_noise.py] |
2.792 |
test.test_generated_examples::test_generated_example[regression/example_regression_tuning_step.py] |
2.536 |
test.test_over_time_repeats.TestMaxSliderPoints::test_default_subsampling_caps_at_max |
2.424 |
Updated by Performance Tracking workflow
What
Introduce a single source of truth for how a missing/unrecorded result entry is represented, and route the storage + consumer code through it.
Why
"Missing" was represented by three dtype-specific sentinels:
ResultFloat/Bool/Vec(+ numeric)floatNaNResultReference/DataSetint-1ResultPath/Video/Image/String/Container/Rerunobject"NAN"…but that mapping was duplicated across
setup_dataset(initial fill) and_sentinel_for_result_var(over_time aging), and consumers hardcoded the check per call site (== "NAN"for file panes,np.isnan/skipnafor numerics). Adding a new result type meant editing severalisinstanceladders.Change
New helpers in
bencher/variables/results.py:result_missing_fill(rv) -> (fill_value, numpy_dtype)result_is_missing(rv, value) -> boolResultCollector.setup_datasetand_sentinel_for_result_varnow build fromresult_missing_fill— the twoisinstanceladders collapse into one polymorphic call.The hardcoded
== "NAN"file checks inbench_result_base.pybecomeresult_is_missing(result_var, filepath).Not a behaviour change
The stored fill values and dtypes are identical to before — so the on-disk cache format, the golden
BenchCfghash, and every reduction are unchanged.pixi run cigreen (1445 passed, 5 skipped — same as base), format/lint/ty/pylint clean.Follow-up enabled (not in this PR)
Because the object-type sentinel is now defined and checked in one place, switching it from the
"NAN"string to a literalNonebecomes a one-line, consumer-invisible change (would need its ownCACHE_VERSIONbump since it changes the stored value).🤖 Generated with Claude Code
Summary by Sourcery
Centralise the representation of missing result-variable entries and route dataset setup and consumers through shared helpers.
Enhancements: