feat(telemetry): reconcile dashboards with real instruments + top-5 tuning signals (#209 phase 1)#211
Merged
Merged
Conversation
…uning signals (#209 phase 1) Phantom metrics: helix-pipeline-observatory.json charted 8 metric names that exist nowhere in helix_context/telemetry. Every panel is repointed at its nearest real instrument (and the job="helix" matchers are dropped - the stack's scrape jobs are otel-collector/prometheus, so they matched nothing): helix_tier_estimation_percent -> helix_tier_fired_total (share %) helix_tier_readable_time_bucket -> helix_genome_signal_seconds_bucket helix_crdt_bucket_accumulation -> helix_cwola_bucket_total helix_rq_duration_seconds_bucket -> helix_context_latency_seconds_bucket helix_ring_edges_by_provenance -> helix_harmonic_edges_total helix_chroni_join_state -> helix_chromatin_state_total helix_cost_concentration_ratio -> helix_hub_concentration_ratio helix_resolve_degree_distribution -> helix_hub_inbound_degree (also: process_resident_memory_bytes{job=helix} -> helix_genome_size_bytes) New instruments (audit doc section 3c), all lazy no-op-when-disabled getters in telemetry/otel.py following the existing pattern: helix_dense_cosine hot dense-recall merge + cold-tier scan helix_shard_fanout ShardRouter.query_genes routed-shard count helix_shard_discrimination routed / known healthy shards (0..1) helix_know_decision_total decide_know_or_miss {outcome, reason} helix_session_tokens_saved_total session working-set elision savings helix_splice_ratio assembled-window compression ratio New deploy/otel/grafana/dashboards/helix-internals.json (uid helix-internals - the launcher and setup scripts already linked to it) with one panel per new instrument. OBSERVABILITY.md: genai_telemetry.py sections (module absent from master) replaced with planned-(#209 phase 2) notes; metric table now matches code. tests/test_telemetry_phase1.py: no-op safety with OTel disabled, call-site label checks, and a dashboard-vs-registry cross-reference that fails on any future phantom metric.
This was referenced Jun 11, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Phase 1 of #209. All 8 phantom metrics in helix-pipeline-observatory.json repointed to real instruments (every intended signal already existed - mapping table in commit); dead job='helix' matchers dropped; process-memory panel fixed. Five new instruments wired at computation sites: helix_dense_cosine (hot/cold arms), helix_shard_fanout + helix_shard_discrimination (the #159 metric), helix_know_decision_total, helix_session_tokens_saved_total, helix_splice_ratio. New helix-internals dashboard (uid already linked by launcher). OBSERVABILITY.md genai_telemetry sections marked planned-phase-2 so docs match code. tests/test_telemetry_phase1.py includes the dashboard-vs-registry phantom-killer regression test. 11/11 new + 388 adjacent passed.