Skip to content

Reduce unnecessary subcalls with early-exit and tighter limits #20

@apenab

Description

@apenab

Problem

The system may execute unnecessary subcalls, increasing token usage and latency.

Proposal

Reduce subcall volume/length by:

  • Early exit when confidence is already high.
  • Limiting lookahead subcalls.
  • Enforcing tighter stop criteria for iterative subcall loops.

Expected Impact

  • Estimated token reduction: ~20-30%
  • Example target: 2.2M -> 1.6M tokens (~-28%)

Risk

  • High: precision can drop if subcalls are cut too aggressively.

Implementation Considerations

  • Guard behind config flags and staged rollout.
  • Add safeguards for ambiguous/complex samples.
  • Log skipped subcalls for regression analysis.

Acceptance Criteria

  • Configurable subcall budget/early-exit policy is implemented.
  • Benchmark compares token/latency/quality before vs after.
  • Quality tradeoff is documented and accepted before defaulting on.
  • Clear rollback path exists.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions