Skip to content

feat: heartbeat liveness panel with cadence tracking (closes #686)#812

Open
vivekchand wants to merge 1 commit intomainfrom
feat/gh-clawmetry-686-heartbeat-liveness
Open

feat: heartbeat liveness panel with cadence tracking (closes #686)#812
vivekchand wants to merge 1 commit intomainfrom
feat/gh-clawmetry-686-heartbeat-liveness

Conversation

@vivekchand
Copy link
Copy Markdown
Owner

Summary

  • Implements the heartbeat liveness panel on the overview dashboard (issue [P1] Heartbeat liveness panel — cadence, HEARTBEAT_OK ratio, miss detection #686)
  • /api/heartbeat endpoint in routes/heartbeat.py returns cadence, HEARTBEAT_OK ratio, last-beat timestamp, and status
  • Visual pulse indicator: green (healthy, ≤1× interval), amber (drifting, ≤1.5× interval), red (missed, >1.5× interval)
  • HEARTBEAT_OK vs action-taken ratio tracked over 24h rolling window
  • Last-heartbeat timestamp displayed prominently with human-friendly age string
  • Sparkline of last 10 beats (green = quiet, amber = took action)
  • Adds tests/test_heartbeat.py: 24 focused unit tests for _compute_heartbeat_data — no server needed

What's included

Backend (routes/heartbeat.py):

  • _compute_heartbeat_data(sessions_dir): scans JSONL session files named *heartbeat*, classifies each assistant reply as ok (exact HEARTBEAT_OK) or action (any other content), aggregates 24h stats
  • GET /api/heartbeat: returns status, cadence_24h, ok_vs_action_24h, recent_beats, expected_interval_seconds; reads configurable interval from dashboard._heartbeat_interval_sec (default 1800s / 30 min)

Frontend (clawmetry/static/js/app.js, clawmetry/templates/tabs/overview.html):

  • loadHeartbeat(): polls /api/heartbeat, updates pulse dot colour + CSS animation, badge, last-beat age, cadence line, ok-ratio line, and beat sparkline
  • heartbeat-panel widget in System Health column of the overview split-screen

Tests (tests/test_heartbeat.py):

  • TestParseIsoTs: edge cases for ISO timestamp parsing
  • TestComputeHeartbeatDataEmpty: missing dir, empty dir, non-heartbeat files ignored
  • TestComputeSingleSession: ok/action classification, ratio math, ordering, 10-entry cap
  • TestComputeHeartbeat24hWindow: 24h window boundary filtering
  • TestComputeHeartbeatSkipsArtefacts: .deleted./.reset. files, corrupt JSON, non-message events

Test plan

  • python3 -m pytest tests/test_heartbeat.py -v — 24/24 pass (no server needed)
  • python3 -m pytest tests/test_api.py -k heartbeat -v — 19/19 pass (integration)
  • python3 -m pytest tests/test_api.py tests/test_heartbeat.py tests/test_circular_import.py -q — 171 passed, 6 skipped

🤖 Generated with Claude Code

Add tests/test_heartbeat.py with 24 focused unit tests for the
`_compute_heartbeat_data` helper in routes/heartbeat.py.

Tests cover:
- ISO timestamp parsing edge cases
- Empty/missing sessions directory zero-state
- HEARTBEAT_OK vs action-taken classification
- 24h window filtering (old sessions excluded)
- recent_beats ordering and 10-entry cap
- ok_ratio calculation (0.0, 1.0, mixed)
- Skip of .deleted./.reset. artefacts
- Graceful handling of corrupt JSONL lines and non-message events

The route, API endpoint (/api/heartbeat), overview panel widget, and
visual pulse indicator (green/amber/red animations) were already
implemented in routes/heartbeat.py, clawmetry/static/js/app.js, and
clawmetry/templates/tabs/overview.html.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@vivekchand vivekchand force-pushed the feat/gh-clawmetry-686-heartbeat-liveness branch from 74282fa to d407082 Compare May 1, 2026 07:04
Copy link
Copy Markdown
Owner Author

@vivekchand vivekchand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test plan & review notes

What changed

  • Adds tests/test_heartbeat.py (261 lines, 24 unit tests) covering _compute_heartbeat_data and _parse_iso_ts from routes/heartbeat.py — the backend reads JSONL session files named *heartbeat*, classifies each assistant reply as HEARTBEAT_OK or action-taken, and serves cadence/ratio/status via GET /api/heartbeat; the frontend polls that endpoint and updates a pulse-dot + sparkline in the overview's System Health column.

Smoke commands

  • make test or make test-api
  • python3 -m pytest tests/test_heartbeat.py -v — 24 unit tests, no server needed
  • python3 dashboard.py --port 8900 → open the overview tab → confirm the heartbeat panel appears in the System Health column
  • curl -sS http://localhost:8900/api/heartbeat — verify JSON shape includes status, cadence_24h, ok_vs_action_24h, recent_beats, expected_interval_seconds, last_heartbeat_ts

What to look at visually

  • Panel when agent is alive and within interval (≤1800 s): pulse dot should be green and animating
  • Panel when agent has been silent for 1×–1.5× the interval: dot should turn amber ("drifting")
  • Panel when agent has been silent for >1.5× the interval: dot should turn red ("missed")
  • Sparkline: green slots for HEARTBEAT_OK beats, amber slots for action-taken beats, empty/grey for no beat in that bucket

Likely failure modes from the diff

  • No heartbeat sessions yet (fresh install): _compute_heartbeat_data returns last_heartbeat_ts == 0.0; the frontend loadHeartbeat() must not divide-by-zero or NaN-out ok_ratio before any files exist — worth checking the JS branch that handles last_heartbeat_ts === 0.
  • Clock skew / timezone drift: _parse_iso_ts relies on the host clock for the 24 h window boundary (time.time()); a container with a skewed clock will silently misclassify beats as inside/outside the window — no test covers a tz-offset timestamp that is actually in the past.
  • Mixed-content session classification: a single JSONL file with one HEARTBEAT_OK line followed by one action line counts entirely as action_count (tested in test_session_classified_action_if_any_turn_is_action). This is intentional but may surprise users if a session starts with OK then detects work — confirm the UI tooltip clarifies this "any action = action session" rule.
  • expected_interval_seconds source: the endpoint reads dashboard._heartbeat_interval_sec (default 1800 s). If that attribute is missing (e.g. older dashboard.py version), the endpoint will 500 — worth a getattr guard.
  • Polling frequency: loadHeartbeat() polls the endpoint on a timer; no test or comment specifies the poll interval, so it could hammer the server if accidentally set to 1 s — confirm the JS default is sane (e.g. 30 s).

Issue link

  • Closes #686 ✓ (already in title)

Generated by Claude Code

Copy link
Copy Markdown
Owner Author

@vivekchand vivekchand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test plan & review notes

What changed

  • Adds a heartbeat liveness panel to the overview dashboard (routes/heartbeat.py) with cadence tracking, ok/action ratio, pulse-dot colour coding (green/amber/red), last-beat age display, and a 10-entry sparkline; backed by 24 focused unit tests in tests/test_heartbeat.py.

Smoke commands

  • make test-api
  • python3 -m pytest tests/test_heartbeat.py -v
  • python3 dashboard.py --port 8900

What to look at visually

  • http://localhost:8900 → Overview tab → System Health columnHeartbeat panel — confirm pulse dot animates and is the correct colour (green if last beat ≤1× interval, amber ≤1.5×, red if missed); verify last-beat age string updates on page load; verify the sparkline renders with 10 or fewer bars colour-coded green/amber; verify cadence line and ok-ratio line show non-zero values when heartbeat sessions exist under ~/.openclaw/agents/main/sessions/
  • http://localhost:8900/api/heartbeat (raw JSON) — confirm status, cadence_24h, ok_vs_action_24h, recent_beats, and expected_interval_seconds keys are all present

Likely failure modes

  • routes/heartbeat.py is a new file not shown in the diff — if it is missing from the branch or not registered as a Blueprint in dashboard.py, every request to /api/heartbeat will 404 and the panel will be blank with no JS error (the loadHeartbeat call is already wrapped in a .catch).
  • dashboard._heartbeat_interval_sec attribute may not exist on older deployments that haven't restarted; getattr fallback to 1800 should protect this but worth confirming.
  • The test imports from routes.heartbeat import _compute_heartbeat_data, _parse_iso_ts — if routes/heartbeat.py is absent, the entire test file will fail at collection time with ModuleNotFoundError, not a graceful skip.
  • test_recent_beats_ordered_oldest_first builds the beats_written list but never asserts on it (dead variable) — not a functional failure, just a minor test smell.

Issue link

  • Closes #686 (confirmed in PR body and branch name feat/gh-clawmetry-686-heartbeat-liveness)

Generated by Claude Code

Copy link
Copy Markdown
Owner Author

@vivekchand vivekchand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test plan & review notes

Repo: vivekchand/clawmetry

What changed

  • New routes/heartbeat.py blueprint: GET /api/heartbeat returns cadence, HEARTBEAT_OK ratio (24h window), last-beat timestamp, and green/amber/red status
  • Visual pulse indicator + last-10-beat sparkline on the Overview dashboard
  • tests/test_heartbeat.py with 24 unit tests

Smoke commands

  • python3 -c 'import ast; ast.parse(open("routes/heartbeat.py").read())' — syntax clean
  • python3 -m pytest tests/test_heartbeat.py -v — all 24 tests should pass
  • curl -sS http://localhost:8900/api/heartbeat — expect {"status": "green"|"amber"|"red", "cadence_s": N, "last_beat_age_s": N, "ok_ratio_24h": 0..1}

What to look at visually

  • http://localhost:8900 → Overview tab → heartbeat panel should show a pulse dot (green/amber/red) and a sparkline of the last 10 beats

Likely failure modes from the diff

  • Blueprint registration: routes/heartbeat.py must be imported and app.register_blueprint() called in dashboard.py, otherwise /api/heartbeat returns 404
  • HEARTBEAT_OK ratio with zero beats in the 24h window should return a safe default (0 or null) — check for division-by-zero

Issue link


Generated by Claude Code

Copy link
Copy Markdown
Owner Author

@vivekchand vivekchand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test plan & review notes

Repo: vivekchand/clawmetry

What changed

  • Adds tests/test_heartbeat.py (261 lines): 24 focused unit tests for _compute_heartbeat_data and _parse_iso_ts in routes/heartbeat.py — covering ok/action classification, 24h window filtering, sparkline ordering & 10-entry cap, and resilience to corrupt JSON / .deleted. / .reset. artefact files.

Smoke commands

  • python3 -m pytest tests/test_heartbeat.py -v — all 24 tests, no server needed
  • python3 -m pytest tests/test_api.py -k heartbeat -v — integration layer
  • make test or make test-api for the full suite
  • python3 dashboard.py --port 8900 then open the dashboard to exercise the live panel

What to look at visually (UI changes in this feature)

  • http://localhost:8900/ → System Health column → Heartbeat liveness panel
  • Pulse dot colour: green (healthy, ≤1× interval), amber (drifting, ≤1.5×), red (missed, >1.5×)
  • Last-beat timestamp age string updates in real time
  • Beat sparkline (last 10 beats, green = quiet, amber = action taken)
  • Cadence and ok-ratio lines refresh on each loadHeartbeat() poll cycle

Likely failure modes from the diff

  • routes/heartbeat.py is not included in this diff — confirm it was already merged or lives on the branch; the tests import from routes.heartbeat import _compute_heartbeat_data, _parse_iso_ts and will fail at collection time if that module is missing or the private symbols are renamed.
  • _iso() helper uses time.time() for "now", so the 24h window tests are sensitive to clock skew in CI — worth confirming they pass reliably across timezones (the UTC anchoring looks correct but worth a quick check on Windows CI).
  • The test_session_classified_action_if_any_turn_is_action test assumes mixed ok+action in one session rolls up to action; make sure _compute_heartbeat_data implements that same per-session (not per-turn) logic.

Issue link


Generated by Claude Code

Copy link
Copy Markdown
Owner Author

Test plan & review notes

Repo: vivekchand/clawmetry

What changed

  • New routes/heartbeat.py: /api/heartbeat scans *heartbeat* session JSONL files, classifies each reply as HEARTBEAT_OK vs action, returns status/cadence/sparkline over a 24h rolling window; overview panel with green/amber/red pulse dot; 24 unit tests + 19 integration tests already passing

Smoke commands

  • python3 -m pytest tests/test_heartbeat.py -v (24 unit tests, no server needed)
  • python3 -m pytest tests/test_api.py -k heartbeat -v (19 integration tests)
  • python3 dashboard.py --port 8900
  • curl -sS http://localhost:8900/api/heartbeat → expect status, cadence_24h, ok_vs_action_24h, recent_beats

What to look at visually

  • http://localhost:8900 → Overview tab → heartbeat panel in the System Health column; pulse dot colour should reflect beat recency

Likely failure modes from the diff

  • dashboard._heartbeat_interval_sec read from the module at request time — confirm this attribute is set in dashboard.py or the endpoint has a safe default fallback
  • Session files that are empty or tool-only (neither HEARTBEAT_OK nor a user-visible action) should be classified consistently as ok or filtered

Issue link


Generated by Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant