Problem
The system may execute unnecessary subcalls, increasing token usage and latency.
Proposal
Reduce subcall volume/length by:
- Early exit when confidence is already high.
- Limiting lookahead subcalls.
- Enforcing tighter stop criteria for iterative subcall loops.
Expected Impact
- Estimated token reduction: ~20-30%
- Example target: 2.2M -> 1.6M tokens (~-28%)
Risk
- High: precision can drop if subcalls are cut too aggressively.
Implementation Considerations
- Guard behind config flags and staged rollout.
- Add safeguards for ambiguous/complex samples.
- Log skipped subcalls for regression analysis.
Acceptance Criteria
- Configurable subcall budget/early-exit policy is implemented.
- Benchmark compares token/latency/quality before vs after.
- Quality tradeoff is documented and accepted before defaulting on.
- Clear rollback path exists.
Problem
The system may execute unnecessary subcalls, increasing token usage and latency.
Proposal
Reduce subcall volume/length by:
Expected Impact
Risk
Implementation Considerations
Acceptance Criteria