fix: conversation pipeline hardening (watermark, dead query, trim heuristic, erosion signal) by njbrake · Pull Request #1440 · mozilla-ai/clawbolt

njbrake · 2026-06-10T08:56:58Z

Note: this PR was drafted by Claude via back-and-forth with @njbrake. The reasoning and decisions are his; the prose is Claude's.

Description

Fixes #1433. Four small fixes from the conversation-design review, bundled as agreed on the issue.

1. Same-row trim atomicity (correctness). History rebuild expands one outbound DB row into a tool-call AssistantMessage, its ToolResultMessages, and a final-reply AssistantMessage, all sharing the row's seq (context._expand_outbound_with_tools). trim_messages treated the reply as a separate block, so it could drop the tool-call half while keeping the reply; the watermark then advanced over the shared seq and silently filtered the kept reply from the next turn's history, and the reply prose never reached the compactor. The reply now trims atomically with its block. Live-loop messages carry seq=None and are unaffected.

2. Remove the dead cross-session context section (perf/cleanup). Migration 026 collapsed sessions to one per user, so build_cross_session_context excluding the current session always returned empty; every message paid a wasted query. Removed the function, the get_other_session_messages_async store method, and the current_session_id plumbing through the prompt assemblers. All channels share the single session, so channel switches are already covered by ordinary history.

3. Accurate proactive trim (correctness of trigger). A fresh ClawboltAgent is built per message, so self._last_input_tokens was always 0 at the proactive trim and the decision fell back to the chars/4 + flat 10k-overhead heuristic, which ignores tool schemas and the real system prompt size, firing later than configured. A bounded process-local LRU now carries the last API-reported input_tokens per user across agent instances (cleared by reset_stores() for test isolation); the heuristic remains the cold-start fallback after a restart.

4. Memory erosion signal (observability). Compaction full-rewrites MEMORY.md and the compliance audit explicitly encourages deletion, so a valid line can vanish on any cycle with no signal. The compaction.summary log line now carries memory_lines_added / memory_lines_removed (multiset line diff, reorder-insensitive) so a log aggregator can alert on large unexplained removals.

Deliberate deviation from the issue: item 4's acceptance criterion asked for the counts on compaction_events rows. This PR puts them on the structured log line only, because adding a migration here would collide with migration 039 in PR #1438 (independent PRs, same Alembic head). Persisting the counts is a 10-line follow-up once 039 lands.

Type

Checklist

Tests pass (uv run pytest -v) (2884 passed, 2 skipped; 6 dead cross-session tests removed, 5 added)
Lint passes (ruff check backend/ && ruff format --check backend/)
New tests added for new functionality
Bug fixes include regression tests

AI Usage

AI-assisted (describe how): Claude implemented all four fixes and their tests, with direction and review by @njbrake.
No AI used

…ristic, erosion signal) Four small fixes from the conversation-design review, bundled per issue #1433: 1. Same-row trim atomicity. History rebuild expands one outbound DB row into a tool-call AssistantMessage, its ToolResultMessages, and a final-reply AssistantMessage, all sharing the row's seq. trim_messages treated the reply as a separate block, so it could drop the tool-call half while keeping the reply; the watermark then advanced over the shared seq and silently filtered the kept reply from the next turn. The reply now trims atomically with its block (live-loop messages carry seq=None and are unaffected). 2. Remove the dead cross-session context section. Migration 026 collapsed sessions to one per user, so the query excluding the current session always returned empty; every message paid a wasted DB query. Removed build_cross_session_context, the get_other_session_messages_async store method, and the current_session_id plumbing through the prompt assemblers. 3. Accurate proactive trim. A fresh ClawboltAgent is built per message, so the trim decision always used the chars/4 + flat-overhead heuristic, which ignores tool schemas and real system prompt size. A bounded process-local LRU now carries the last API-reported input_tokens per user across agent instances; the heuristic remains the cold-start fallback. 4. Memory erosion signal. Compaction full-rewrites MEMORY.md and the compliance audit encourages deletion, so a valid line can vanish on any cycle with no signal. The compaction.summary log line now carries memory_lines_added / memory_lines_removed (multiset diff, reorder-insensitive) so an aggregator can alert on large unexplained removals. Persisting the counts on compaction_events can follow once migration 039 (PR #1438) lands, avoiding an Alembic head collision. Fixes #1433 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

coderabbitai · 2026-06-10T08:57:04Z

Warning

Review limit reached

@njbrake, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 3 minutes and 31 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: f308ecaa-c543-4e93-912f-b8e75a5ae0f8

📥 Commits

Reviewing files that changed from the base of the PR and between f30e8ce and f24282f.

📒 Files selected for processing (11)

backend/app/agent/compaction.py
backend/app/agent/core.py
backend/app/agent/session_db.py
backend/app/agent/stores.py
backend/app/agent/system_prompt.py
backend/app/agent/trimming.py
backend/app/routers/user_sessions.py
tests/test_agent.py
tests/test_compaction.py
tests/test_session_db_async.py
tests/test_system_prompt.py

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/conversation-hardening

✨ Simplify code

Create PR with simplified code
Commit simplified code in branch fix/conversation-hardening

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

…stem_prompt.py) # Conflicts: # backend/app/agent/system_prompt.py

merge main: resolve overlaps with #1436 (compaction.py) and #1439 (sy…

f24282f

…stem_prompt.py) # Conflicts: # backend/app/agent/system_prompt.py

njbrake merged commit 33418a0 into main Jun 10, 2026
10 checks passed

njbrake deleted the fix/conversation-hardening branch June 10, 2026 11:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: conversation pipeline hardening (watermark, dead query, trim heuristic, erosion signal)#1440

fix: conversation pipeline hardening (watermark, dead query, trim heuristic, erosion signal)#1440
njbrake merged 2 commits into
mainfrom
fix/conversation-hardening

njbrake commented Jun 10, 2026

Uh oh!

coderabbitai Bot commented Jun 10, 2026 •

edited

Loading

Review limit reached

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

njbrake commented Jun 10, 2026

Description

Type

Checklist

AI Usage

Uh oh!

coderabbitai Bot commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented Jun 10, 2026 •

edited

Loading