Skip to content

feat: Add thinking-level benchmarking support#378

Merged
olearycrew merged 1 commit intomainfrom
feat/thinking-levels-rebased
May 5, 2026
Merged

feat: Add thinking-level benchmarking support#378
olearycrew merged 1 commit intomainfrom
feat/thinking-levels-rebased

Conversation

@ScuttleBot
Copy link
Copy Markdown
Contributor

Clean rebase of #76 (originally from #12 by @jb510) against current main.

Summary

Adds --thinking flag to configure reasoning depth for models that support it.

Changes

  • Add VALID_THINKING_LEVELS constant: off, minimal, low, medium, high, xhigh, adaptive
  • Add thinking_level parameter to execute_openclaw_task()
  • Pass --thinking to openclaw agent command
  • Add --thinking argument to benchmark.py with validation
  • Update README command reference

Usage

# Run with high reasoning depth
./scripts/run.sh --model openrouter/anthropic/claude-sonnet-4 --thinking high

# Run with minimal thinking for speed
./scripts/run.sh --model openrouter/anthropic/claude-sonnet-4 --thinking minimal

Why a new PR?

The original PR #76 had extensive merge conflicts due to main branch evolution (parallel judging, categories, incremental results, etc.). Rather than resolve 6 conflict blocks across 3 files, this is a clean implementation of the same feature on current main.

Supersedes #76.

Co-authored-by: ForceConstant ForceConstant@users.noreply.github.com
Co-authored-by: jb510 jb510@users.noreply.github.com

Add --thinking flag to benchmark.py to configure reasoning depth for
models that support it. Valid levels: off, minimal, low, medium, high,
xhigh, adaptive.

Changes:
- Add VALID_THINKING_LEVELS constant in lib_agent.py
- Add thinking_level parameter to execute_openclaw_task()
- Pass --thinking to openclaw agent command
- Add --thinking argument to benchmark.py with validation
- Update README command reference

This is a clean rebase of PR #76 (originally from #12 by @jb510)
against the current main branch which has significantly evolved.

Co-authored-by: ForceConstant <ForceConstant@users.noreply.github.com>
Co-authored-by: jb510 <jb510@users.noreply.github.com>
@kilo-code-bot
Copy link
Copy Markdown
Contributor

kilo-code-bot Bot commented May 5, 2026

Code Review Summary

Status: No Issues Found | Recommendation: Merge

Clean implementation of thinking-level support. The thinking_level value is validated against a whitelist before use and passed to subprocess.run as a list argument (not shell-interpolated), so there's no injection risk. The optional parameter flows correctly through the call chain with proper None defaults.

Files Reviewed (3 files)
  • scripts/lib_agent.pyVALID_THINKING_LEVELS constant + thinking_level param in execute_openclaw_task()
  • scripts/benchmark.py--thinking CLI arg with validation
  • README.md — docs update

Reviewed by claude-4.6-sonnet-20260217 · 99,364 tokens

@olearycrew olearycrew merged commit 28ddf62 into main May 5, 2026
2 checks passed
@ScuttleBot ScuttleBot mentioned this pull request May 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants