Problem
Current max_steps=15 can overrun straightforward samples and inflate latency.
Proposal
Reduce max_steps from 15 to 10 (or make it configurable with a lower default) to cut average runtime.
Expected Impact
- Estimated runtime improvement: ~20%
- Approximate per-sample runtime: 108.5s -> ~95s
Risk
- Yes, visible quality risk on complex samples that need deeper reasoning.
Implementation Considerations
- Keep
max_steps configurable per experiment/profile.
- Consider dataset- or difficulty-aware overrides.
- Pair with evaluation guardrails to detect regressions quickly.
Acceptance Criteria
max_steps is configurable and documented.
- Benchmarks compare
15 vs 10 with:
- latency
- token usage
- correctness/grounding metrics
- Regression analysis identifies which sample types degrade most.
- Decision note documents whether
10 becomes default or remains an opt-in profile.
Problem
Current
max_steps=15can overrun straightforward samples and inflate latency.Proposal
Reduce
max_stepsfrom 15 to 10 (or make it configurable with a lower default) to cut average runtime.Expected Impact
Risk
Implementation Considerations
max_stepsconfigurable per experiment/profile.Acceptance Criteria
max_stepsis configurable and documented.15vs10with:10becomes default or remains an opt-in profile.