Skip to content

Claude/consolidate cms frequency pr sb5 kk#104

Open
c-dickens wants to merge 15 commits intoapache:masterfrom
c-dickens:claude/consolidate-cms-frequency-pr-Sb5Kk
Open

Claude/consolidate cms frequency pr sb5 kk#104
c-dickens wants to merge 15 commits intoapache:masterfrom
c-dickens:claude/consolidate-cms-frequency-pr-Sb5Kk

Conversation

@c-dickens
Copy link
Contributor

@c-dickens c-dickens commented Feb 24, 2026

Initial point query profile and plots

cms_point_query_error
cms_quantile_slices

@c-dickens c-dickens marked this pull request as draft February 24, 2026 23:02
claude and others added 10 commits February 25, 2026 19:44
Consolidates the Count-Min Sketch frequency estimation characterization
from PRs #1, #2, and #3 into a single clean profile. The profile sweeps
across sketch widths (256-4096) with constant load factor (distinct/width
≈ 4), runs adaptive trials per width, and uses KLL sketches to track
error distribution quantiles (median, p75, p90, p95, max) for both
absolute and relative error metrics against theoretical bounds.

https://claude.ai/code/session_01RmEdWmm6vYXY3XevAAsWVe
Remove all relative error tracking. Single KLL sketch tracks
absolute error distribution only. 170 lines → 100 lines.

https://claude.ai/code/session_01RmEdWmm6vYXY3XevAAsWVe
Both width and stream length are now fixed parameters rather
than sweep axes. Runs N trials and reports one row of results
with KLL quantiles on absolute point query error.

https://claude.ai/code/session_01RmEdWmm6vYXY3XevAAsWVe
Bar chart of absolute error quantiles from KLL sketch
with theoretical bound overlay. Reads single-row TSV.

https://claude.ai/code/session_01RmEdWmm6vYXY3XevAAsWVe
This line referenced a CMake target 'common' that doesn't exist in
the project, causing cmake configuration to fail.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implements per-item error characterisation using KLL quantile sketches
across trials with a fixed cached stream. Includes C++ profile, Python
plotting script (log-log SVG with sigma bands and theoretical bounds),
and bound violation tracking. Also comments out tdigest accuracy profiles
that depend on unavailable ddsketch.hpp.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- C++ profile now outputs 27 quantile levels from 0.0 to 1.0, including
  the original 7 sigma levels plus dense intermediate coverage for richer
  slice analysis
- Main plot updated to show min/max bands alongside sigma-level bands,
  using percentile labels instead of sigma notation (not normally distributed)
- Add frequency slice plot script that shows the full error quantile
  function at selected true frequencies

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Reshape C++ quantile output to focus on the upper tail where CMS
  error behavior is interesting: sparse below median, dense 90-99%
  with per-mille resolution in the 99-100% range (30 levels total)
- Rewrite slice plot as empirical CDF: log(absolute error) on x-axis,
  quantile level on y-axis, with theoretical bound as vertical line
- Main band plot unchanged (selects 9 levels from wider set)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@c-dickens c-dickens force-pushed the claude/consolidate-cms-frequency-pr-Sb5Kk branch from aab73aa to 9c3d597 Compare February 25, 2026 19:45
c-dickens and others added 5 commits February 25, 2026 19:45
Rewrite Experiment 1 section to be precise about:
- Fixed stream across trials (only CMS seed varies)
- stdout for TSV data, stderr for diagnostics
- Dense upper-tail quantile levels (not symmetric sigma)
- Correct uv invocation (uv run script.py, not uv run python)
- Both plot specs (band plot + error CDF slice plot)
- Bound violation tracking
- Makefile targets

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Restore tdigest accuracy profile includes and registrations to match
  upstream master (our PR should not modify unrelated code)
- Remove plot_cms_frequency_slice.py (out of scope for this PR)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This was an earlier experiment superseded by cms_point_query_profile.
Only the point query profile is needed for this PR.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
plot_cms_frequency.py reads a different TSV format that the current
profile doesn't produce. The active plotting script is
cpp/scripts/plot_cms_point_query.py.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@c-dickens c-dickens marked this pull request as ready for review February 25, 2026 20:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants