Skip to content

fix(stats,frequency): gate Apache DataSketches behind little-endian targets#3847

Merged
jqnatividad merged 2 commits into
masterfrom
datasketches-require-little-endian
May 13, 2026
Merged

fix(stats,frequency): gate Apache DataSketches behind little-endian targets#3847
jqnatividad merged 2 commits into
masterfrom
datasketches-require-little-endian

Conversation

@jqnatividad
Copy link
Copy Markdown
Collaborator

Summary

  • Apache DataSketches' upstream C++ does not compile on big-endian targets. This makes qsv build cleanly on s390x (and other big-endian platforms) by moving datasketches = \"0.2\" under [target.'cfg(not(target_endian = \"big\"))'.dependencies] and gating every call site in src/cmd/stats.rs and src/cmd/frequency.rs behind #[cfg(not(target_endian = \"big\"))] (with cfg_select! at allocation sites to keep call shapes intact). TDigestSlot / HllSlot get zero-sized unit-like fallbacks on big-endian so Stats's derives still apply.
  • On big-endian targets the CLI rejects --quantile-method approx, --cardinality-method approx, and --sketch-method frequent_items with a clear "requires a little-endian target" error that names s390x. The exact defaults are unaffected, and the OOM auto-enable in stats / frequency no-ops on big-endian instead of offering an unavailable fallback. Help text for each flag documents the limitation.
  • Updates .github/workflows/rust-s390x.yml to --skip the sketch-exercising integration tests (the exact-path and invalid-value tests still run), and annotates .github/workflows/publish-s390x-unknown-linux-gnu.yml with a header explaining which CLI options are absent from the s390x binaries. The disabled push: tags trigger on the publish workflow is left as-is — a separate operational decision.

Test plan

  • cargo tree --target s390x-unknown-linux-gnu -F all_features reports 0 references to `datasketches`; host dep tree still has 1.
  • Host `cargo build --locked -F all_features` succeeds.
  • Host `cargo clippy --locked -F all_features` clean.
  • All 10 sketch-related stats tests pass on host.
  • All 8 frequent_items frequency tests pass on host.
  • Dry-run of the s390x test command's `--skip` filters: the 4 non-sketch tests (`_invalid_value_is_rejected`, `_exact_is_default_and_byte_identical`) still pass; the 14 sketch tests are skipped as intended.
  • s390x CI workflow run (`rust-s390x.yml`) succeeds with the new `--skip` filters.
  • s390x release build (`publish-s390x-unknown-linux-gnu.yml`) produces working binaries; spot-check that approx/frequent_items flags emit the expected error.

🤖 Generated with Claude Code

…argets

Apache DataSketches (the `datasketches` crate's upstream C++) does not
support big-endian platforms. Make qsv build cleanly on s390x by moving
the dependency under
`[target.'cfg(not(target_endian = "big"))'.dependencies]` and gating
the t-digest / HyperLogLog / Frequent Items call sites in
`src/cmd/stats.rs` and `src/cmd/frequency.rs` behind
`#[cfg(not(target_endian = "big"))]` (with `cfg_select!` at allocation
sites to keep call shapes intact). `TDigestSlot` and `HllSlot` get
zero-sized unit-like fallbacks on big-endian so the surrounding `Stats`
derives still apply.

On big-endian targets the CLI now rejects `--quantile-method approx`,
`--cardinality-method approx`, and `--sketch-method frequent_items`
with a "requires a little-endian target" error that names s390x. The
exact-method defaults are unaffected, and the OOM auto-enable in
`stats` / `frequency` no-ops on big-endian instead of offering an
unavailable fallback. Help text for each flag documents the
limitation.

Also updates `.github/workflows/rust-s390x.yml` to skip the
sketch-exercising integration tests via `--skip` filters (the exact
and invalid-value tests still run), and annotates
`.github/workflows/publish-s390x-unknown-linux-gnu.yml` with a header
noting which CLI options are absent from the s390x binaries.

Verification:
  * `cargo tree --target s390x-unknown-linux-gnu -F all_features`
    reports 0 references to `datasketches`; host tree still has 1.
  * Host build + clippy clean with `-F all_features`.
  * All 10 stats and 8 frequency sketch tests pass on host.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@codacy-production
Copy link
Copy Markdown

codacy-production Bot commented May 13, 2026

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

Four Low-severity findings from the review on commit 27ac19b:

1. `.github/workflows/rust-s390x.yml`: extend the comment so future
   readers know why `frequency_sketch_method_invalid_map_size_is_rejected`
   is also skipped — the platform-rejection error now fires before
   --sketch-map-size validation, so the test's assertion no longer
   matches on s390x.

2. `src/cmd/stats.rs` (Stats::new): replace the
   `cfg_select! { _ => unreachable!() }` allocation pattern for t-digest
   and HLL with plain `#[cfg(not(target_endian = "big"))]` on the
   surrounding `if which.approx_quantiles` / `if which.cardinality &&
   which.approx_cardinality` branches. The dead arms are now compiled
   out entirely — an internal caller that bypasses `run()`'s CLI
   validation on big-endian can no longer trigger a runtime panic.

3. `src/cmd/frequency.rs` (run, OOM block): add an inline comment on
   the big-endian `if !index_succeeded` branch noting that the silent
   fall-through on `index_succeeded == true` is intentional and matches
   the little-endian behavior — there is no DataSketches auto-enable
   to skip.

4. `tests/test_stats.rs`, `tests/test_frequency.rs`: add
   `#[cfg(target_endian = "big")]`-gated tests that invoke each of the
   three rejected flags and assert the platform error message contains
   the flag name, "requires a little-endian target", and "s390x". The
   names (`*_big_endian_*_rejected`) are deliberately chosen to avoid
   the s390x CI `--skip` filters so the new coverage actually runs on
   s390x.

Verification: host build / clippy clean with `-F all_features`; all 18
existing sketch tests still pass; `cargo tree --target
s390x-unknown-linux-gnu -F all_features` still reports 0 references to
`datasketches`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR makes qsv compile and run on big-endian targets (notably s390x), where the upstream datasketches crate cannot build. The datasketches dependency is moved under a cfg(not(target_endian = "big")) target-conditional table, and every consumer in stats / frequency is cfg-gated. On big-endian, approximate sketch options (--quantile-method approx, --cardinality-method approx, --sketch-method frequent_items) are rejected at CLI parse time with a clear platform-specific error; help text documents the limitation; OOM auto-enable code paths no-op; and the s390x CI workflow --skips the sketch-exercising tests while compiling new big-endian-only rejection tests.

Changes:

  • Move datasketches to a target-conditional dependency and add #[cfg(not(target_endian = "big"))] gates around all sketch types, allocations, merges, and helpers (TDigestSlot, HllSlot, try_enable_approx_sketches, can_enable_frequent_items, run_frequent_items, the HLL_LG_K constant), with ZST fallbacks for TDigestSlot/HllSlot so Stats derives keep working.
  • Up-front CLI rejection for --quantile-method approx, --cardinality-method approx, and --sketch-method frequent_items on big-endian, with help-text notes; OOM auto-enable in stats/frequency falls through to the original error on big-endian.
  • Update s390x CI workflow to skip the sketch-only integration tests (and the now-unreachable _invalid_map_size_is_rejected test); add big-endian-only rejection tests; document binary-feature differences in the publish workflow header.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated no comments.

Show a summary per file
File Description
Cargo.toml Moves datasketches = "0.2" to a target.'cfg(not(target_endian = "big"))'.dependencies block.
src/cmd/stats.rs Cfg-gates HLL/t-digest types, allocations, merges, helper, and HLL_LG_K; rejects approx flags upfront on big-endian; adds ZST TDigestSlot/HllSlot fallbacks; updates help text.
src/cmd/frequency.rs Cfg-gates run_frequent_items and can_enable_frequent_items; rejects frequent_items on big-endian; mirrors OOM fallback control flow on big-endian; updates help text.
tests/test_stats.rs Adds big-endian-only tests asserting rejection error messages for the two approx flags.
tests/test_frequency.rs Adds a big-endian-only test asserting rejection error message for --sketch-method frequent_items.
.github/workflows/rust-s390x.yml Adds explanatory comments and --skip filters so sketch-exercising tests are not run on s390x.
.github/workflows/publish-s390x-unknown-linux-gnu.yml Adds a header documenting which sketch-backed CLI options are absent from s390x binaries.

@jqnatividad jqnatividad merged commit f27ded2 into master May 13, 2026
21 checks passed
@jqnatividad jqnatividad deleted the datasketches-require-little-endian branch May 13, 2026 02:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants