Implement plan 05: test coverage gaps by blooop · Pull Request #960 · blooop/bencher

blooop · 2026-06-12T06:32:29Z

Summary

Implements plans/05-test-coverage.md (from #959): direct unit tests for the previously indirectly-tested results/ and core modules, de-flaked file server tests, and resolved long-skipped tests. One module per commit, as the plan prescribes.

Coverage: 90% → 91% (missed lines 1050 → 1001 of 10776). Tests: 1474 passed / 5 skipped → 1651 passed / 2 skipped (+177 tests across 16 new test files).

Task 1 — unit tests for untested result modules

Module	Coverage before → after
`results/bench_result.py`	75% → 83%
`results/histogram_result.py`	83% → 83% (behavioral assertions added)
`results/optimize_result.py`	97% → 100%
`results/dataset_result.py`	64% → 86%
`holoview_results/scatter_result.py`	76% → 100%
`holoview_results/bar_result.py`	96% → 96%
`holoview_results/curve_result.py`	100% (direct tests added)
`holoview_results/band_result.py`	71% → 95%
`holoview_results/table_result.py`	100% (direct tests added)
`distribution_result/` box/violin/jitter	100% (direct tests added)
`results/volume_result.py`	100% (direct tests added)
`composable_container_dataframe.py`	100% (data-intactness tests added)

Several of these were already at high line coverage purely through generated-example integration tests; the new tests assert behavior directly — element types, kdims/vdims, title/ylabel/units propagation, values matching the worker function, filter rejection paths, and NaN robustness (NaN is the missing-value default since v1.105).

Task 2 — core modules

test_bench_cfg.py (42 tests): defaults, flag round-trips, deprecated level= mapping, persistent hash behavior, describe/sentence helpers, optuna target partition. bench_cfg.py 97% → 99%.
test_worker_job.py (16 tests): input assembly, canonical form, hash stability/sensitivity.
video_writer.py: no new file — the plan's target (write path with synthetic frames, file exists and non-empty) is already covered by test_video_writer_extended.py.

Task 3 — de-flake `test_file_server.py`

Replaced each time.sleep(0.3) with a wait_for_port() TCP poll (5s timeout, 0.1s step, TimeoutError on failure). Verified stable across 10 consecutive runs (all passed).

Task 4 — long-skipped tests (with deviations; the plan's named tests don't exist)

test_combinations.py: not an empty placeholder — a 184-line hypothesis class fully @pytest.mark.skip since 2023. Deleted per the plan's intent.
The skipped test in test_sweep_vars.py is test_int_sweep_samples_all (not test_missing_default_value): an exact line-for-line duplicate of the passing test above it. Deleted.
test_bencher.py::test_unique_file_names (not test_plot_all_permutations): bare skip replaced with a self-describing reason pointing at the plan.
test_usability.py: deprecated bn.ResultVar → bn.ResultFloat (verified identical behavior).

Bugs found while writing tests (not fixed here — test-only PR)

ScatterResult.to_scatter filters result_types=(ResultVar,), but ResultVar is the deprecated shim — modern ResultFloat sweeps silently return None (the shipped scatter example appends None to its report). Pinned by test_to_scatter_result_float_returns_none; the filter likely wants ResultFloat.
DistributionResult._plot_distribution forwards title inside **kwargs and as an explicit kwarg into .opts() — any caller passing title= would hit "multiple values for keyword argument".
Histogram output never shows result-var units (only the var name); tests assert the implemented behavior.

Notes

fail_under (plan 01) is not in place yet, so there is nothing to raise; when plan 01 lands it can start at 91%.
Full pixi run ci passes locally.

🤖 Generated with Claude Code

Summary by Sourcery

Increase unit test coverage for core result and configuration modules and stabilize flaky file server tests.

Bug Fixes:

Deflake file server integration tests by replacing fixed sleeps with port polling and clarify a previously bare skip reason.
Remove redundant and long-skipped sweep/combinations tests and update usability tests to use the non-deprecated ResultFloat API.

Enhancements:

Add focused unit tests for BenchRunCfg/BenchCfg behavior including defaults, hashing, and descriptive helpers.
Introduce direct behavioral tests for BenchResult and various Holoviews-based result types (curve, bar, band, histogram, scatter, box/violin/jitter, volume, table, dataset).
Add tests for WorkerJob input assembly and hashing and for DataFrame-based composable container datasets, including NaN robustness and data integrity.

…leeps Replace each time.sleep(0.3) with wait_for_port(), which polls a TCP connect to the server port (5s timeout, 0.1s step) and raises TimeoutError on failure. Verified stable across 10 consecutive runs. Plan 05, task 3.

- Delete test/test_combinations.py: the whole class has been @pytest.mark.skip since 2023; its hypothesis sweep duplicates what test_bencher.py and the generated-example suite now cover. - Delete test_sweep_vars.py::test_int_sweep_samples_all: it was an exact line-for-line duplicate of the passing test_int_sweep_samples directly above it, skipped with no reason. - test_bencher.py::test_unique_file_names: replace the bare skip with a self-describing reason pointing at plans/05-test-coverage.md task 4. - test_usability.py: use bn.ResultFloat instead of the deprecated bn.ResultVar shim (identical behavior, verified before/after). Plan 05, task 4.

Cover to(result_type) conversion (values plotted match the worker), to_auto plot_list/remove_plots/failing-callback handling, to_auto_plots summary placement, plot() callback dispatch, default_plot_callbacks, from_existing state copies, and NaN-point robustness. Plan 05, task 1.

Cover viewer construction from a small ResultDataSet sweep, dataset_list round-tripping worker DataFrames unchanged, xarray index-to-frame mapping, and ds_to_container unwrapping the stored frame. Plan 05, task 1.

Cover 3-float volume construction (Plotly pane, single Volume trace, axis titles with units, trace values verified against the worker formula, isomin/isomax), the unsupported-shape rejection path, the over_time short-circuit, and NaN robustness (finite-only isomin/isomax). Plan 05, task 1.

…_dataframe.py) Assert data intactness through composition: append order/identity, single-DataFrame passthrough, right/down/sequence concat values and coords, overlay elementwise mean with skipna semantics, and label_formatter output. Complements test_composable_container_dataset.py which covers dims/sizes only. Plan 05, task 1.

Cover defaults, flag round-trips (repeats/over_time/cache flags), deprecated level= mapping, with_defaults semantics, persistent hash behavior (tag/bench_name/const-vars/include_repeats), describe_benchmark and sweep_sentence content, panel helpers, optimized/unoptimized input partition, optuna targets, and DimsCfg construction. Plan 05, task 2.

Cover construction defaults, function_input/canonical_input assembly (constants merged but excluded from canonical form), fn_inputs_sorted, and hash behavior: stable across dim ordering, sensitive to values/dim names/tag/constants, independent of bench_cfg_sample_hash. video_writer.py needs no new file: the core write path is already covered by test_video_writer_extended.py. Plan 05, task 2.

Cover the repeats>1 Curve+Spread overlay (labels, kdims/vdims, std band), to_plot delegation, categorical groupby via to_curve_ds, the repeats=1 filter rejection, and NaN robustness. Plan 05, task 1.

Cover static percentile-band composition (two Areas, median Curve, samples Scatter, percentile vdim names), default/explicit titles and unit-bearing ylabel, enable_scatter=False, categorical flattening into the sample pool, the over_time band path, regression-report suppression of to_band_ds, and NaN handling (nanpercentile + scatter mask drops NaN points). Plan 05, task 1.

Cover hv.Table construction (kdims/vdims), row count = samples x repeats, values matching the worker, repeat-dim squeeze at repeats=1, and NaN row preservation. Plan 05, task 1.

Cover _to_scatter_ds element type and title, the public to_scatter path, 2-cat NdOverlay grouping, float-input rejection, and NaN robustness. Also pins a discovered quirk: to_scatter filters result_types on the deprecated ResultVar, so modern ResultFloat sweeps silently return None (test_to_scatter_result_float_returns_none documents current behavior; the filter likely wants ResultFloat). Plan 05, task 1.

Cover to_bar Row/HoloViews/Bars structure, to_plot delegation, dims and unit-bearing ylabel via to_bar_ds, the ResultBool repeats=2 REDUCE scenario, 2-cat grouped kdims, float-input rejection, and NaN robustness. Plan 05, task 1.

…r_result.py) Cover overlay/element types, kdims/vdims and unit-bearing ylabel/title, raw repeats present per category (no aggregation), 2-cat kdims, repeats=1 filter rejection, and NaN robustness. Plan 05, task 1.

Cover overlay/element types, label/units propagation, raw repeats per category, filter rejection, and NaN robustness through the shared DistributionResult base. Plan 05, task 1.

…_jitter_result.py) Cover jitter scatter element type, label propagation, per-category raw sample integrity with repeats=3, its stricter cat_range filter (2-cat rejection), and NaN robustness. Plan 05, task 1.

Cover element structure (kdim/vdim names), bin frequencies summing to the sample count, bins= kwarg forwarding, title/ylabel/xrotation opts, the native 0-input repeats filter and float-input rejection, and NaN samples being dropped from bin counts without crashing. Note: result-var units never appear in histogram output (only the var name); tests assert the implemented behavior. Plan 05, task 1.

Cover exact best_value/best_params and summary() lines for single-objective studies, Pareto-front membership and the single-objective-only RuntimeError for multi-objective, sweep-driven structure (trial counts, target_names, directions), and NaN worker output marking the trial FAIL while the study continues. Plan 05, task 1.

sourcery-ai · 2026-06-12T06:32:38Z

Reviewer's Guide

Adds extensive new unit tests for core result and configuration modules, replaces flaky file server sleeps with a port polling helper, removes or clarifies long-skipped tests, and tightens behavioral guarantees (including NaN handling and plotting/filtering behavior) without changing production code.

File-Level Changes

Change	Details	Files
Stabilized file server tests by replacing fixed sleeps with a TCP port wait helper.	Added a wait_for_port helper that polls a TCP connection until the server accepts connections or times out. Updated file server tests to use wait_for_port instead of time.sleep, improving reliability without extending total runtime.	`test/test_file_server.py`
Removed or clarified long-skipped tests to align with the coverage plan and reduce dead code.	Deleted the fully skipped, long-unused hypothesis test module for combinations. Removed a duplicated skipped hypothesis test in sweep var tests that duplicated behavior of an existing passing test. Replaced a bare skip in bencher tests with a descriptive reason referencing the coverage plan. Updated usability tests to use ResultFloat instead of deprecated ResultVar for output, preserving behavior while modernizing APIs.	`test/test_combinations.py` `test/test_sweep_vars.py` `test/test_bencher.py` `test/test_usability.py`
Introduced comprehensive unit tests for bench configuration objects and worker job hashing to pin defaults and hash semantics.	Added tests for BenchPlotSrvCfg and BenchRunCfg defaults, deprecated level mapping, with_defaults behavior, and persistent hash behavior in BenchCfg, including optuna target partitioning and DimsCfg extraction. Added tests for WorkerJob construction, canonical input/constant merging, and hash stability/sensitivity to inputs, dim names, constants, tags, and ordering.	`test/test_bench_cfg.py` `test/test_worker_job.py`
Added behavioral tests for BenchResult conversion paths and holoviews-based result types, focusing on kdims/vdims, labels, title/ylabel, and NaN robustness.	Added tests for BenchResult.to, to_auto, plotting callbacks, default plot callbacks, clone semantics, and NaN handling across plotting paths. Introduced direct tests for BandResult, CurveResult, BarResult, ScatterResult, ScatterJitterResult, BoxWhiskerResult, ViolinResult, HistogramResult, TableResult, VolumeResult, and DistributionResult-derived classes, checking filter behavior, element composition, grouping, and handling of NaNs. Documented and pinned the current ScatterResult filter bug where ResultFloat sweeps return None due to result_types=(ResultVar,) and ensured tests reflect existing behavior without changing production code.	`test/test_bench_result.py` `test/test_band_result.py` `test/test_curve_result.py` `test/test_bar_result.py` `test/test_scatter_result.py` `test/test_scatter_jitter_result.py` `test/test_box_whisker_result.py` `test/test_violin_result.py` `test/test_histogram_result.py` `test/test_volume_result.py` `test/test_table_result.py`
Strengthened dataset and composable container behavior tests to ensure data integrity and dataset result handling.	Added tests for DataSetResult to validate that worker-produced pandas DataFrames are stored, indexed, and recovered intact from xarray datasets and viewer paths. Added tests for ComposableContainerDataset created from pandas DataFrames to ensure composition methods (right, down, sequence, overlay) preserve values, coordinates, NaN semantics, and var name/value labelling.	`test/test_dataset_result.py` `test/test_composable_container_dataframe.py`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey - I've found 1 issue, and left some high level feedback:

Several test helpers (e.g. unwrap_hv, _inner_element, _run_sweep, run_cfg_with) are reimplemented with very similar logic across multiple new test modules; consider moving these into a shared test utility module to avoid duplication and keep behavior consistent.
The new wait_for_port helper in test_file_server.py is specific but generic enough to be reused; if other tests also spin up HTTP/TCP servers it may be worth centralizing this function in a common test helper to avoid diverging polling logic and timeouts.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- Several test helpers (e.g. `unwrap_hv`, `_inner_element`, `_run_sweep`, `run_cfg_with`) are reimplemented with very similar logic across multiple new test modules; consider moving these into a shared test utility module to avoid duplication and keep behavior consistent.
- The new `wait_for_port` helper in `test_file_server.py` is specific but generic enough to be reused; if other tests also spin up HTTP/TCP servers it may be worth centralizing this function in a common test helper to avoid diverging polling logic and timeouts.

## Individual Comments

### Comment 1
<location path="test/test_band_result.py" line_range="123-132" />
<code_context>
+class TestBandResult:
</code_context>
<issue_to_address>
**suggestion (testing):** Add a negative test to pin the BandResult filter behavior when repeats=1 or otherwise not matching the filter criteria.

Current BandResult tests only cover valid shapes and special cases; they don't verify what happens when the filter rejects input (e.g., sweeps that don't meet `float_range`/`repeats_range` like `repeats>=2`). For other result types, we assert that `to_*` with `override=False` returns a diagnostic/None instead of a plot when the filter fails. Please add a similar negative test here that:

- builds a Band-style sweep that should be rejected (e.g., `repeats=1`),
- calls `to_band(override=False)`, and
- asserts that the result is not an `hv.Overlay`/HoloViews pane (and optionally checks the diagnostic text when `print_debug` is enabled).

This will lock in the rejection behavior for unsupported shapes and prevent regressions where BandResult might silently emit misleading plots.

Suggested implementation:

```python
class TestBandResult:
    def test_to_band_overlay_composition(self, res_1d):
        """to_band yields two percentile Areas, a median Curve and a samples Scatter."""
        plot = res_1d.to_band()
        assert plot is not None
        overlay = unwrap_hv(plot)
        assert isinstance(overlay, hv.Overlay)
        # exact types: hv.Area is a subclass of hv.Curve, so isinstance would double count
        assert len([el for el in overlay if type(el) is hv.Area]) == 2
        assert len([el for el in overlay if type(el) is hv.Curve]) == 1
        assert len([el for el in overlay if type(el) is hv.Scatter]) == 1

    def test_to_band_rejected_when_filter_fails(self, res_1d_repeats_1):
        """to_band(override=False) should not emit a Band plot if the filter rejects the sweep.

        In particular, a Band-style sweep with repeats=1 (or otherwise outside the
        allowed repeats/float ranges) should be rejected and yield a diagnostic
        instead of an hv.Overlay/HoloViews pane.
        """
        # When the filter fails, to_band(override=False) should *not* return a plot.
        # With print_debug enabled, we also expect a human-readable diagnostic.
        result = res_1d_repeats_1.to_band(override=False, print_debug=True)

        # Unwrap any Panel/HoloViews wrappers, then assert that we did *not* get an Overlay.
        unwrapped = unwrap_hv(result)
        assert not isinstance(unwrapped, hv.Overlay)

        # Optional: pin a bit of the diagnostic so regressions in the filter behavior are caught.
        # Adjust the substring below to match the actual diagnostic message emitted by BandResult.
        assert "repeats" in str(result) or "filter" in str(result)

```

To make this test pass, you also need to:

1. Add a `res_1d_repeats_1` fixture in this file (or a shared conftest) that:
   - Builds a Band-style sweep identical (or very similar) to `res_1d`, but with `repeats=1` or otherwise configured so that the BandResult filter rejects it (e.g., by violating `float_range`/`repeats_range` such as `repeats>=2`).
   - Returns the corresponding `BandResult` instance.

   For example (pseudocode, adapt to your actual API):

   ```python
   @pytest.fixture
   def res_1d_repeats_1(bench, run_cfg):
       run_cfg_single = dataclasses.replace(run_cfg, repeats=1)  # or whatever your API is
       res = bench.plot_sweep(
           "band_time",
           input_vars=["size"],
           result_vars=["throughput"],
           run_cfg=run_cfg_single,
           time_src="2026-06-10 snap0000",
       )
       return res
   ```

2. Adjust the diagnostic assertion in the test if the actual `BandResult.to_band` rejection message differs:
   - Change the `"repeats"` / `"filter"` substring checks to match the concrete text your implementation emits when the filter fails.
3. If `to_band(override=False, print_debug=True)` returns a structured object (e.g. `(diagnostic, plot)` or a Panel pane) instead of a single value, update the test accordingly:
   - Unpack the return value, and apply the `unwrap_hv` / `isinstance(..., hv.Overlay)` checks to the plot component.
   - Apply the substring assertion to the diagnostic component.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

github-actions · 2026-06-12T06:35:15Z

Performance Report for `f8eee40`

Metric	Value
Total tests	1653
Total time	123.24s
Mean	0.0746s
Median	0.0020s

Top 10 slowest tests

Test	Time (s)
`test.test_bench_examples.TestBenchExamples::test_example_meta`	16.957
`test.test_over_time_save_perf::test_save_faster_without_aggregated_tab`	4.982
`test.test_generated_examples::test_generated_example[regression/example_regression_tuning_drift.py]`	4.544
`test.test_generated_examples::test_generated_example[cartesian_animation/example_cartesian_animation.py]`	2.949
`test.test_split_render_examples::test_split_render_subprocess_media`	2.940
`test.test_generated_examples::test_generated_example[regression/example_regression_tuning_step.py]`	2.830
`test.test_generated_examples::test_generated_example[result_types/result_image/example_result_image_to_video.py]`	2.807
`test.test_hash_persistent.TestCrossProcessDeterminism::test_hash_stable_across_two_processes[ResultBool]`	2.795
`test.test_generated_examples::test_generated_example[regression/example_regression_tuning_noise.py]`	2.777
`test.test_over_time_repeats.TestMaxSliderPoints::test_default_subsampling_caps_at_max`	2.398

Full report

Updated by Performance Tracking workflow

blooop · 2026-06-12T14:51:48Z

"Histogram output never shows result-var units (only the var name); tests assert the implemented behaviour" this is not desired. All graphs should show units. check this across the codebase (make a follow up pr if appropriate)

Addresses review feedback that all graphs should display units. The histogram x-axis carries the result variable but only showed its name; now uses the '{name} [{units}]' convention shared with band/bar/heatmap.

Move the copy-pasted unwrap_hv, inner_element and run_cfg_with helpers into a shared module, and add a BandResult negative test pinning that a non-scalar (vector) result is rejected rather than silently plotted.

blooop · 2026-06-12T15:01:42Z

Good catch — fixed the histogram in this PR (e9a91fc): the x-axis now shows {name} [{units}] like band/bar/heatmap, and the test asserts value [m] instead of pinning the old behavior.

Audited the rest of the codebase per your request. Units are already shown by band, bar, heatmap, surface, and the distribution plots (box/violin/jitter inherit from DistributionResult). The remaining gaps are curve_result.py, line_result.py, and scatter_result.py, which don't set an axis label with units at all. I'll address those in a follow-up PR since they're production-only changes outside this test-coverage PR's scope.

Also addressed the Sourcery feedback: consolidated the duplicated unwrap_hv / _inner_element / run_cfg_with helpers into test/helpers.py, and added a BandResult negative test (the suggested repeats=1 case isn't rejected by band's filter, so the test pins the real rejection path — a non-scalar vector result).

github-actions · 2026-06-12T15:04:13Z

Performance Report for `e9a91fc`

Metric	Value
Total tests	1656
Total time	120.29s
Mean	0.0726s
Median	0.0020s

Top 10 slowest tests

Test	Time (s)
`test.test_bench_examples.TestBenchExamples::test_example_meta`	16.002
`test.test_over_time_save_perf::test_save_faster_without_aggregated_tab`	4.634
`test.test_generated_examples::test_generated_example[regression/example_regression_tuning_drift.py]`	4.377
`test.test_generated_examples::test_generated_example[cartesian_animation/example_cartesian_animation.py]`	3.036
`test.test_split_render_examples::test_split_render_subprocess_media`	3.023
`test.test_hash_persistent.TestCrossProcessDeterminism::test_hash_stable_across_two_processes[ResultBool]`	2.869
`test.test_generated_examples::test_generated_example[result_types/result_image/example_result_image_to_video.py]`	2.706
`test.test_generated_examples::test_generated_example[regression/example_regression_tuning_noise.py]`	2.584
`test.test_generated_examples::test_generated_example[regression/example_regression_tuning_step.py]`	2.253
`test.test_over_time_repeats.TestMaxSliderPoints::test_default_subsampling_caps_at_max`	2.238

Full report

Updated by Performance Tracking workflow

Addresses the Sourcery review on #960: the per-file _run_sweep bodies in the bar/scatter/box-whisker/violin/scatter-jitter result tests were near-identical copies. Add run_named_sweep and run_dist_sweep to test/helpers.py and delegate to them, so the run-config and plot-callback setup lives in one place. Also bump version to 1.106.0.

github-actions · 2026-06-12T15:55:17Z

Performance Report for `8c8b3a6`

Metric	Value
Total tests	1659
Total time	124.12s
Mean	0.0748s
Median	0.0020s

Top 10 slowest tests

Test	Time (s)
`test.test_bench_examples.TestBenchExamples::test_example_meta`	17.088
`test.test_over_time_save_perf::test_save_faster_without_aggregated_tab`	5.142
`test.test_generated_examples::test_generated_example[regression/example_regression_tuning_drift.py]`	4.564
`test.test_split_render_examples::test_split_render_subprocess_media`	2.986
`test.test_generated_examples::test_generated_example[cartesian_animation/example_cartesian_animation.py]`	2.938
`test.test_hash_persistent.TestCrossProcessDeterminism::test_hash_stable_across_two_processes[ResultBool]`	2.855
`test.test_generated_examples::test_generated_example[result_types/result_image/example_result_image_to_video.py]`	2.826
`test.test_generated_examples::test_generated_example[regression/example_regression_tuning_noise.py]`	2.782
`test.test_generated_examples::test_generated_example[regression/example_regression_tuning_step.py]`	2.492
`test.test_over_time_repeats.TestMaxSliderPoints::test_default_subsampling_caps_at_max`	2.414

Full report

Updated by Performance Tracking workflow

blooop · 2026-06-12T16:02:04Z

Follow-up PR for the remaining axis-unit gaps (curve/line/scatter) is up: #963

blooop added 19 commits June 12, 2026 07:18

test: unit tests for DataSetResult (results/dataset_result.py)

f02775e

Cover viewer construction from a small ResultDataSet sweep, dataset_list round-tripping worker DataFrames unchanged, xarray index-to-frame mapping, and ds_to_container unwrapping the stored frame. Plan 05, task 1.

test: unit tests for CurveResult (holoview_results/curve_result.py)

6533329

Cover the repeats>1 Curve+Spread overlay (labels, kdims/vdims, std band), to_plot delegation, categorical groupby via to_curve_ds, the repeats=1 filter rejection, and NaN robustness. Plan 05, task 1.

test: unit tests for TableResult (holoview_results/table_result.py)

5823184

Cover hv.Table construction (kdims/vdims), row count = samples x repeats, values matching the worker, repeat-dim squeeze at repeats=1, and NaN row preservation. Plan 05, task 1.

test: unit tests for ViolinResult (distribution_result/violin_result.py)

c54c3e6

Cover overlay/element types, label/units propagation, raw repeats per category, filter rejection, and NaN robustness through the shared DistributionResult base. Plan 05, task 1.

test: unit tests for ScatterJitterResult (distribution_result/scatter…

24818fa

…_jitter_result.py) Cover jitter scatter element type, label propagation, per-category raw sample integrity with repeats=3, its stricter cat_range filter (2-cat rejection), and NaN robustness. Plan 05, task 1.

fix lint: pylint disables for fixtures and test-class counters

f8eee40

sourcery-ai Bot reviewed Jun 12, 2026

View reviewed changes

Comment thread test/test_band_result.py

blooop added 2 commits June 12, 2026 16:01

fix: show result-var units on histogram x-axis

50ca54a

Addresses review feedback that all graphs should display units. The histogram x-axis carries the result variable but only showed its name; now uses the '{name} [{units}]' convention shared with band/bar/heatmap.

test: consolidate duplicated result-test helpers into test/helpers.py

e9a91fc

Move the copy-pasted unwrap_hv, inner_element and run_cfg_with helpers into a shared module, and add a BandResult negative test pinning that a non-scalar (vector) result is rejected rather than silently plotted.

blooop added 3 commits June 12, 2026 16:38

fix: bump version to 1.107.0 (1.106.x already released on main)

86f5353

Merge origin/main (resolve version to 1.107.0)

8c8b3a6

blooop enabled auto-merge June 12, 2026 15:52

blooop merged commit 73fca69 into main Jun 12, 2026
7 of 8 checks passed

blooop deleted the test/coverage-plan-05 branch June 12, 2026 15:58

blooop mentioned this pull request Jun 12, 2026

fix: show units on curve, line and scatter axis labels #963

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement plan 05: test coverage gaps#960

Implement plan 05: test coverage gaps#960
blooop merged 24 commits into
mainfrom
test/coverage-plan-05

blooop commented Jun 12, 2026 •

edited by sourcery-ai Bot

Loading

Uh oh!

sourcery-ai Bot commented Jun 12, 2026 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai Bot left a comment

Uh oh!

Uh oh!

github-actions Bot commented Jun 12, 2026

Uh oh!

blooop commented Jun 12, 2026

Uh oh!

blooop commented Jun 12, 2026

Uh oh!

github-actions Bot commented Jun 12, 2026

Uh oh!

github-actions Bot commented Jun 12, 2026

Uh oh!

Uh oh!

blooop commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

blooop commented Jun 12, 2026 • edited by sourcery-ai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Task 1 — unit tests for untested result modules

Task 2 — core modules

Task 3 — de-flake test_file_server.py

Task 4 — long-skipped tests (with deviations; the plan's named tests don't exist)

Bugs found while writing tests (not fixed here — test-only PR)

Notes

Summary by Sourcery

Uh oh!

sourcery-ai Bot commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions Bot commented Jun 12, 2026

Performance Report for f8eee40

Uh oh!

blooop commented Jun 12, 2026

Uh oh!

blooop commented Jun 12, 2026

Uh oh!

github-actions Bot commented Jun 12, 2026

Performance Report for e9a91fc

Uh oh!

github-actions Bot commented Jun 12, 2026

Performance Report for 8c8b3a6

Uh oh!

Uh oh!

blooop commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

blooop commented Jun 12, 2026 •

edited by sourcery-ai Bot

Loading

Task 3 — de-flake `test_file_server.py`

sourcery-ai Bot commented Jun 12, 2026 •

edited

Loading

Performance Report for `f8eee40`

Performance Report for `e9a91fc`

Performance Report for `8c8b3a6`