fix(stats,frequency): gate Apache DataSketches behind little-endian targets#3847
Conversation
…argets
Apache DataSketches (the `datasketches` crate's upstream C++) does not
support big-endian platforms. Make qsv build cleanly on s390x by moving
the dependency under
`[target.'cfg(not(target_endian = "big"))'.dependencies]` and gating
the t-digest / HyperLogLog / Frequent Items call sites in
`src/cmd/stats.rs` and `src/cmd/frequency.rs` behind
`#[cfg(not(target_endian = "big"))]` (with `cfg_select!` at allocation
sites to keep call shapes intact). `TDigestSlot` and `HllSlot` get
zero-sized unit-like fallbacks on big-endian so the surrounding `Stats`
derives still apply.
On big-endian targets the CLI now rejects `--quantile-method approx`,
`--cardinality-method approx`, and `--sketch-method frequent_items`
with a "requires a little-endian target" error that names s390x. The
exact-method defaults are unaffected, and the OOM auto-enable in
`stats` / `frequency` no-ops on big-endian instead of offering an
unavailable fallback. Help text for each flag documents the
limitation.
Also updates `.github/workflows/rust-s390x.yml` to skip the
sketch-exercising integration tests via `--skip` filters (the exact
and invalid-value tests still run), and annotates
`.github/workflows/publish-s390x-unknown-linux-gnu.yml` with a header
noting which CLI options are absent from the s390x binaries.
Verification:
* `cargo tree --target s390x-unknown-linux-gnu -F all_features`
reports 0 references to `datasketches`; host tree still has 1.
* Host build + clippy clean with `-F all_features`.
* All 10 stats and 8 frequency sketch tests pass on host.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Up to standards ✅🟢 Issues
|
Four Low-severity findings from the review on commit 27ac19b: 1. `.github/workflows/rust-s390x.yml`: extend the comment so future readers know why `frequency_sketch_method_invalid_map_size_is_rejected` is also skipped — the platform-rejection error now fires before --sketch-map-size validation, so the test's assertion no longer matches on s390x. 2. `src/cmd/stats.rs` (Stats::new): replace the `cfg_select! { _ => unreachable!() }` allocation pattern for t-digest and HLL with plain `#[cfg(not(target_endian = "big"))]` on the surrounding `if which.approx_quantiles` / `if which.cardinality && which.approx_cardinality` branches. The dead arms are now compiled out entirely — an internal caller that bypasses `run()`'s CLI validation on big-endian can no longer trigger a runtime panic. 3. `src/cmd/frequency.rs` (run, OOM block): add an inline comment on the big-endian `if !index_succeeded` branch noting that the silent fall-through on `index_succeeded == true` is intentional and matches the little-endian behavior — there is no DataSketches auto-enable to skip. 4. `tests/test_stats.rs`, `tests/test_frequency.rs`: add `#[cfg(target_endian = "big")]`-gated tests that invoke each of the three rejected flags and assert the platform error message contains the flag name, "requires a little-endian target", and "s390x". The names (`*_big_endian_*_rejected`) are deliberately chosen to avoid the s390x CI `--skip` filters so the new coverage actually runs on s390x. Verification: host build / clippy clean with `-F all_features`; all 18 existing sketch tests still pass; `cargo tree --target s390x-unknown-linux-gnu -F all_features` still reports 0 references to `datasketches`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR makes qsv compile and run on big-endian targets (notably s390x), where the upstream datasketches crate cannot build. The datasketches dependency is moved under a cfg(not(target_endian = "big")) target-conditional table, and every consumer in stats / frequency is cfg-gated. On big-endian, approximate sketch options (--quantile-method approx, --cardinality-method approx, --sketch-method frequent_items) are rejected at CLI parse time with a clear platform-specific error; help text documents the limitation; OOM auto-enable code paths no-op; and the s390x CI workflow --skips the sketch-exercising tests while compiling new big-endian-only rejection tests.
Changes:
- Move
datasketchesto a target-conditional dependency and add#[cfg(not(target_endian = "big"))]gates around all sketch types, allocations, merges, and helpers (TDigestSlot,HllSlot,try_enable_approx_sketches,can_enable_frequent_items,run_frequent_items, theHLL_LG_Kconstant), with ZST fallbacks forTDigestSlot/HllSlotsoStatsderives keep working. - Up-front CLI rejection for
--quantile-method approx,--cardinality-method approx, and--sketch-method frequent_itemson big-endian, with help-text notes; OOM auto-enable instats/frequencyfalls through to the original error on big-endian. - Update s390x CI workflow to skip the sketch-only integration tests (and the now-unreachable
_invalid_map_size_is_rejectedtest); add big-endian-only rejection tests; document binary-feature differences in the publish workflow header.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| Cargo.toml | Moves datasketches = "0.2" to a target.'cfg(not(target_endian = "big"))'.dependencies block. |
| src/cmd/stats.rs | Cfg-gates HLL/t-digest types, allocations, merges, helper, and HLL_LG_K; rejects approx flags upfront on big-endian; adds ZST TDigestSlot/HllSlot fallbacks; updates help text. |
| src/cmd/frequency.rs | Cfg-gates run_frequent_items and can_enable_frequent_items; rejects frequent_items on big-endian; mirrors OOM fallback control flow on big-endian; updates help text. |
| tests/test_stats.rs | Adds big-endian-only tests asserting rejection error messages for the two approx flags. |
| tests/test_frequency.rs | Adds a big-endian-only test asserting rejection error message for --sketch-method frequent_items. |
| .github/workflows/rust-s390x.yml | Adds explanatory comments and --skip filters so sketch-exercising tests are not run on s390x. |
| .github/workflows/publish-s390x-unknown-linux-gnu.yml | Adds a header documenting which sketch-backed CLI options are absent from s390x binaries. |
Summary
datasketches = \"0.2\"under[target.'cfg(not(target_endian = \"big\"))'.dependencies]and gating every call site insrc/cmd/stats.rsandsrc/cmd/frequency.rsbehind#[cfg(not(target_endian = \"big\"))](withcfg_select!at allocation sites to keep call shapes intact).TDigestSlot/HllSlotget zero-sized unit-like fallbacks on big-endian soStats's derives still apply.--quantile-method approx,--cardinality-method approx, and--sketch-method frequent_itemswith a clear "requires a little-endian target" error that names s390x. The exact defaults are unaffected, and the OOM auto-enable instats/frequencyno-ops on big-endian instead of offering an unavailable fallback. Help text for each flag documents the limitation..github/workflows/rust-s390x.ymlto--skipthe sketch-exercising integration tests (the exact-path and invalid-value tests still run), and annotates.github/workflows/publish-s390x-unknown-linux-gnu.ymlwith a header explaining which CLI options are absent from the s390x binaries. The disabledpush: tagstrigger on the publish workflow is left as-is — a separate operational decision.Test plan
cargo tree --target s390x-unknown-linux-gnu -F all_featuresreports 0 references to `datasketches`; host dep tree still has 1.🤖 Generated with Claude Code