Skip to content

feat: add TokenUsageTracker callback for SDK mode#5

Closed
sami-marreed wants to merge 4 commits into
mainfrom
feature/add-token-usage-tracker-to-sdk
Closed

feat: add TokenUsageTracker callback for SDK mode#5
sami-marreed wants to merge 4 commits into
mainfrom
feature/add-token-usage-tracker-to-sdk

Conversation

@sami-marreed

@sami-marreed sami-marreed commented May 26, 2026

Copy link
Copy Markdown
Contributor

Related to cuga-agent#71

Note: the original description cited Fixes #37, which refers to an internal-tracker issue that was not migrated to this repo. Reference removed.

Summary

Adds TokenUsageTracker-like functionality to SDK mode (CugaAgent) to standardize trajectory capture across all benchmarks. This enables SDK mode to produce trajectory files with the same prompt richness as AgentRunner mode.

Changes

Core Implementation

  • New callback handler (benchmarks/helpers/token_usage_tracker_callback.py)

    • LangChain callback that captures system/user/assistant prompts
    • Mimics TokenUsageTracker behavior from cuga-agent's agent_loop.py
    • Forwards captured data to ActivityTracker via callbacks
  • Updated SDK helper (benchmarks/helpers/sdk_eval_helpers.py)

    • setup_agent_with_tools now automatically includes TokenUsageTracker callback
    • New parameter: enable_token_usage_tracker (default: True)
    • Backward compatible - can be disabled if needed
  • M3 evaluations updated to include callback:

    • benchmarks/m3/eval_m3.py
    • benchmarks/m3/eval_m3_task_1_enterprise_style.py

Security Updates

Impact

Before

SDK mode trajectories had:

  • Empty prompts fields on most steps
  • Minimal step granularity (5 steps)
  • Token tracking via Langfuse only

After

SDK mode trajectories now have:

  • Full LLM conversation history in prompts fields
  • System prompts, user prompts, and assistant responses for every LLM call
  • Same prompt richness as AgentRunner mode
  • Compatible with cuga-viz and trajectory analysis tools

Affected Benchmarks

All 4 benchmarks now capture rich trajectory data:

  • BPO - Automatic via setup_agent_with_tools()
  • M3 - Automatic + manual for direct CugaAgent creation
  • Oak Health Insurance - Automatic via setup_agent_with_tools()
  • AppWorld SDK - Automatic via setup_agent_with_tools()

Testing

  • just lint - All checks passed
  • just security - No vulnerabilities found
  • Pre-commit hooks passed

Documentation

See TOKENUSAGETRACKER_IMPLEMENTATION.md for complete implementation details, usage examples, and future work.

Notes

This is a workaround until upstream cuga-agent#71 is implemented to add native TokenUsageTracker support to CugaAgent SDK.

Fixes #37

- Add SDKTokenUsageTrackerCallback for rich trajectory capture
- Enable for all benchmarks (BPO, M3, Oak, AppWorld SDK)
- SDK trajectories now include full LLM conversation prompts
- Upgrade langchain-core from 1.3.0 to 1.3.3 (fixes CVE-2026-44843)
- Upgrade python-multipart from 0.0.26 to 0.0.28 (fixes CVE-2026-42561)
- All security checks now pass (just security ✅)
@haroldship

Copy link
Copy Markdown
Collaborator

Update

  • Merged latest main into feature/add-token-usage-tracker-to-sdk (resolved setup_agent_with_tools callback init conflict).
  • Local just ci: lint ✓, 266 tests ✓, security ✓.
  • Pushed to origin; CI should rerun shortly.

@haroldship

Copy link
Copy Markdown
Collaborator

Status: Merged main (e3f7438), conflict in setup_agent_with_tools resolved. All CI checks green. Mergeable.

Ready for review when you are.

@haroldship

Copy link
Copy Markdown
Collaborator

Closing without merge — superseded by upstream fix.

Upstream: cuga-agent#71 (closed) implemented in cuga-agent#236. CugaAgent now registers TokenUsageTracker internally; merging this PR would duplicate prompt/step capture.

Eval follow-up: Documentation tracked in #42 (CONTRIBUTING note + optional dep pins).

Issues: This PR had no linked cuga-eval issue (only cross-repo cuga-agent#71). Nothing else to close here.

@haroldship haroldship closed this Jun 4, 2026
@haroldship haroldship deleted the feature/add-token-usage-tracker-to-sdk branch June 4, 2026 12:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants