fix(cuga_lite): reset task_todos state between invocations#315
fix(cuga_lite): reset task_todos state between invocations#315Sergey-Zeltyn wants to merge 1 commit into
Conversation
The closure-scoped `task_todos_ref` (and the parallel `state.task_todos` checkpointer field) outlived a single `agent.invoke()`. With `enable_todos=true`, the previous task's plan leaked into the next task's turn-1 system prompt as `## Current task todos`, biasing the model's first reasoning step toward an unrelated plan. Clear both at the top of `prepare_tools_and_apps`, gated on a fresh conversation (`len(chat_messages) <= 1`) so HITL resume is unaffected. Fixes #314. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Signed-off-by: Sergey Zeltyn <sergeyz@il.ibm.com>
📝 WalkthroughWalkthroughThe PR fixes a bug where task plans from a prior ChangesTask todos reset on new conversation
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~3 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/adapter/prepare_node.py`:
- Around line 73-80: Run ruff formatting and linting on the cuga_graph/nodes
changes: execute "ruff format" then "ruff check" (or use the project's uv
wrapper if available) targeting the src/cuga/backend/cuga_graph/nodes/ tree so
prepare_node.py and related files are formatted and any lint errors are
surfaced; after fixing any reported issues re-run to ensure
adapter._task_todos_ref usage and state.task_todos handling in prepare_node.py
meet ruff rules.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 0cae5ba2-d548-4644-857a-63d0aa75c3b8
📒 Files selected for processing (1)
src/cuga/backend/cuga_graph/nodes/cuga_lite/adapter/prepare_node.py
| # The adapter's task_todos_ref is closure-scoped per compiled graph and | ||
| # outlives a single .invoke(); state.task_todos can also persist via a | ||
| # thread-keyed checkpointer. Reset both at the start of a new | ||
| # conversation so a previous task's plan doesn't leak into this one's | ||
| # turn-1 system prompt. | ||
| if len(state.chat_messages or []) <= 1: | ||
| adapter._task_todos_ref.clear() | ||
| state.task_todos = None |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
uv run ruff format src/cuga/backend/cuga_graph/nodes/
uv run ruff check src/cuga/backend/cuga_graph/nodes/Repository: cuga-project/cuga-agent
Length of output: 151
Run Ruff format + Ruff check for cuga_graph/nodes changes
The required ruff format and ruff check commands haven’t been executed in this environment (the uv run ... approach requires uv to be available). Run formatting and linting for src/cuga/backend/cuga_graph/nodes/ (via uv if available, otherwise ruff format + ruff check directly).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/cuga/backend/cuga_graph/nodes/cuga_lite/adapter/prepare_node.py` around
lines 73 - 80, Run ruff formatting and linting on the cuga_graph/nodes changes:
execute "ruff format" then "ruff check" (or use the project's uv wrapper if
available) targeting the src/cuga/backend/cuga_graph/nodes/ tree so
prepare_node.py and related files are formatted and any lint errors are
surfaced; after fixing any reported issues re-run to ensure
adapter._task_todos_ref usage and state.task_todos handling in prepare_node.py
meet ruff rules.
Source: Coding guidelines
Code ReviewVerdict: merge with fixes. The primary half of the fix is correct and demonstrably works (42/43→0/43 on the eval), but the second half — the What's solid
Issues🔴 Critical — 🟠 Important — guard doesn't cover new-task-on-same-thread. 🟠 Important — HITL preservation is incidental, not designed. 🟡 Minor
Recommendation
|
Bug fix
Fixes #314
Summary
When
advanced_features.enable_todos=true, the closure-scopedtask_todos_ref(plus the parallelstate.task_todoscheckpointer field) outlived a singleagent.invoke(). From the second task onward the previous task's plan leaked into the next task's turn-1 system prompt as## Current task todos, biasing the model's first reasoning step before it had any chance to callcreate_update_todosfor the new task.Root cause:
CugaAgent._create_graphmemoizes onself._graphand only resets inadd_tool()(sdk.py:2224), so every.invoke()reuses the same compiled graph and the same closure-captured list created increate_cuga_lite_graph(cuga_lite_graph.py:180). The tool body intodos.py:123-126only doesclear() + extend(new)— it never resets on a new task. Nothing in the SDK, agent loop, or framework cleared these between invocations.Fix: at the top of
prepare_tools_and_apps(prepare_node.py), detect a fresh conversation vialen(state.chat_messages) <= 1and clear bothadapter._task_todos_refandstate.task_todos. The guard preserves HITL resume (turn 2+ keeps the in-flight plan).Verification
Reproduced and verified on a 43-task AppWorld eval bundle (gpt-4.1,
enable_todos=true):## Current task todosin turn-1 system prompt## Current Planin turn-1 (state path)Concrete example from the pre-fix bundle — task asking about HR/candidates/
report_template.mdopens with a stale Gmail plan injected as authoritative:After the fix, every one of the 43 traces starts turn 1 with a clean system prompt.
Testing
enable_todos=true: leak signature dropped from 42/43 to 0/43cuga_lite/tests/test_agent_graph_adapter.pyexerciseprepare_system_contentdirectly (notprepare_tools_and_apps), so they're unaffected by the new clear logicSummary by CodeRabbit