fix: conversation pipeline hardening (watermark, dead query, trim heuristic, erosion signal)#1440
Conversation
…ristic, erosion signal) Four small fixes from the conversation-design review, bundled per issue #1433: 1. Same-row trim atomicity. History rebuild expands one outbound DB row into a tool-call AssistantMessage, its ToolResultMessages, and a final-reply AssistantMessage, all sharing the row's seq. trim_messages treated the reply as a separate block, so it could drop the tool-call half while keeping the reply; the watermark then advanced over the shared seq and silently filtered the kept reply from the next turn. The reply now trims atomically with its block (live-loop messages carry seq=None and are unaffected). 2. Remove the dead cross-session context section. Migration 026 collapsed sessions to one per user, so the query excluding the current session always returned empty; every message paid a wasted DB query. Removed build_cross_session_context, the get_other_session_messages_async store method, and the current_session_id plumbing through the prompt assemblers. 3. Accurate proactive trim. A fresh ClawboltAgent is built per message, so the trim decision always used the chars/4 + flat-overhead heuristic, which ignores tool schemas and real system prompt size. A bounded process-local LRU now carries the last API-reported input_tokens per user across agent instances; the heuristic remains the cold-start fallback. 4. Memory erosion signal. Compaction full-rewrites MEMORY.md and the compliance audit encourages deletion, so a valid line can vanish on any cycle with no signal. The compaction.summary log line now carries memory_lines_added / memory_lines_removed (multiset diff, reorder-insensitive) so an aggregator can alert on large unexplained removals. Persisting the counts on compaction_events can follow once migration 039 (PR #1438) lands, avoiding an Alembic head collision. Fixes #1433 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Warning Review limit reached
More reviews will be available in 3 minutes and 31 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (11)
✨ Finishing Touches🧪 Generate unit tests (beta)
✨ Simplify code
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
…stem_prompt.py) # Conflicts: # backend/app/agent/system_prompt.py
Description
Fixes #1433. Four small fixes from the conversation-design review, bundled as agreed on the issue.
1. Same-row trim atomicity (correctness). History rebuild expands one outbound DB row into a tool-call
AssistantMessage, itsToolResultMessages, and a final-replyAssistantMessage, all sharing the row'sseq(context._expand_outbound_with_tools).trim_messagestreated the reply as a separate block, so it could drop the tool-call half while keeping the reply; the watermark then advanced over the shared seq and silently filtered the kept reply from the next turn's history, and the reply prose never reached the compactor. The reply now trims atomically with its block. Live-loop messages carryseq=Noneand are unaffected.2. Remove the dead cross-session context section (perf/cleanup). Migration 026 collapsed sessions to one per user, so
build_cross_session_contextexcluding the current session always returned empty; every message paid a wasted query. Removed the function, theget_other_session_messages_asyncstore method, and thecurrent_session_idplumbing through the prompt assemblers. All channels share the single session, so channel switches are already covered by ordinary history.3. Accurate proactive trim (correctness of trigger). A fresh
ClawboltAgentis built per message, soself._last_input_tokenswas always 0 at the proactive trim and the decision fell back to the chars/4 + flat 10k-overhead heuristic, which ignores tool schemas and the real system prompt size, firing later than configured. A bounded process-local LRU now carries the last API-reportedinput_tokensper user across agent instances (cleared byreset_stores()for test isolation); the heuristic remains the cold-start fallback after a restart.4. Memory erosion signal (observability). Compaction full-rewrites MEMORY.md and the compliance audit explicitly encourages deletion, so a valid line can vanish on any cycle with no signal. The
compaction.summarylog line now carriesmemory_lines_added/memory_lines_removed(multiset line diff, reorder-insensitive) so a log aggregator can alert on large unexplained removals.Deliberate deviation from the issue: item 4's acceptance criterion asked for the counts on
compaction_eventsrows. This PR puts them on the structured log line only, because adding a migration here would collide with migration 039 in PR #1438 (independent PRs, same Alembic head). Persisting the counts is a 10-line follow-up once 039 lands.Type
Checklist
uv run pytest -v) (2884 passed, 2 skipped; 6 dead cross-session tests removed, 5 added)ruff check backend/ && ruff format --check backend/)AI Usage