Summary
Document the cuga-agent requirement for full SDK trajectory capture (steps[].prompts, cuga-viz) so contributors do not re-add an eval-side TokenUsageTracker workaround.
Background
- cuga-agent#71 — closed; fixed in cuga-agent#236 (
CugaAgent wires TokenUsageTracker in _build_callbacks()).
- cuga-eval PR #5 — eval-repo duplicate callback; closed without merge (would double-register tracking with current cuga-agent).
There is no matching open issue in cuga-eval for PR #5 (PR only referenced upstream #71).
Tasks
Acceptance criteria
- New contributors running SDK/AppWorld/M3 evals understand trajectory richness comes from cuga-agent, not cuga-eval callbacks.
- No duplicate
TokenUsageTracker / SDKTokenUsageTrackerCallback in this repo.
References
Summary
Document the cuga-agent requirement for full SDK trajectory capture (
steps[].prompts, cuga-viz) so contributors do not re-add an eval-sideTokenUsageTrackerworkaround.Background
CugaAgentwiresTokenUsageTrackerin_build_callbacks()).There is no matching open issue in cuga-eval for PR #5 (PR only referenced upstream #71).
Tasks
CONTRIBUTING.md(cuga-agent path dependency section): note that SDK benchmarks need a../cuga-agentcheckout including #71 / #236 for rich trajectories; do not add a harness-levelTokenUsageTrackercopy.setup_agent_with_toolsif an old cuga-agent is detected (only if a cheap version/import check exists).langchain-core/python-multipartfloor pins inpyproject.tomlfor CVE policy —uv.lockalready resolves newer transitive versions; evaluate whether explicit pins add value.Acceptance criteria
TokenUsageTracker/SDKTokenUsageTrackerCallbackin this repo.References