Add thinking-level benchmarking support (off/minimal/low/medium/high/xhigh/adaptive) by ForceConstant · Pull Request #76 · pinchbench/skill

ForceConstant · 2026-03-25T18:03:44Z

This is a rebase of #12 by @jb510

Note I haven't gotten to test this out yet, but it does seem valid as I am not currently setup to run.

- Add --thinking CLI argument to specify comma-separated thinking levels - Pass thinking level to OpenClaw agent via --thinking flag - Run each task across all specified thinking levels - Include thinking_level in task results - Add thinking_aggregates section with per-level statistics - Support levels: off, minimal, low, medium, high - Update SKILL.md and README.md with documentation Closes pinchbench#9

- Add xhigh and adaptive to valid thinking levels (matching OpenClaw) - Add model-aware xhigh validation (only GPT-5.x models support it) - Validate thinking levels before passing to OpenClaw subprocess - Document model-specific restrictions in help text and docs - Follow existing code style (Optional[str] instead of str | None) - No unnecessary changes to existing code

- Add strict xhigh model matching (provider-aware) - Add adaptive support detection (Anthropic Claude 4.6 family) - Deduplicate requested thinking levels while preserving order - Fail fast when --thinking is provided but no valid levels remain - Keep subprocess input constrained to validated levels

jb510 · 2026-03-25T19:16:12Z

That's why I didn't do the rebase I uninstalled PinchBench so I couldn't test :D. It was tested and working when I originally PR'd it. Dealing with other OpenClaw issues at the momment so can't test, hope someone can.

olearycrew

@ForceConstant can I trouble you for one more rebase? we have a lot of incoming with the new pending v2 release this week

…evels

ForceConstant · 2026-04-14T15:30:41Z

@olearycrew ok I updated branch.

ScuttleBot · 2026-05-05T16:29:28Z

Superseded by #378 — a clean implementation on current main. The original had 6 conflict blocks across 3 files due to main branch evolution (parallel judging, categories, incremental results, etc.).

@jb510

Add --thinking flag to benchmark.py to configure reasoning depth for models that support it. Valid levels: off, minimal, low, medium, high, xhigh, adaptive. Changes: - Add VALID_THINKING_LEVELS constant in lib_agent.py - Add thinking_level parameter to execute_openclaw_task() - Pass --thinking to openclaw agent command - Add --thinking argument to benchmark.py with validation - Update README command reference This is a clean rebase of PR pinchbench#76 (originally from #12 by @jb510) against the current main branch which has significantly evolved. Co-authored-by: ForceConstant <ForceConstant@users.noreply.github.com> Co-authored-by: jb510 <jb510@users.noreply.github.com>

OpenClaw Agent and others added 5 commits March 25, 2026 17:12

update README.md

7a7dfaa

small merge cleanup

c225b1a

olearycrew requested changes Apr 14, 2026

View reviewed changes

ForceConstant added 2 commits April 14, 2026 11:26

Merge remote-tracking branch 'upstream/main' into continue-thinking-l…

6901319

…evels

Merge branch 'pinchbench:main' into continue-thinking-levels

1aff24a

ScuttleBot mentioned this pull request May 5, 2026

feat: Add thinking-level benchmarking support #378

Merged

ScuttleBot closed this May 5, 2026

ScuttleBot mentioned this pull request May 6, 2026

Thinking Levels #9

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add thinking-level benchmarking support (off/minimal/low/medium/high/xhigh/adaptive)#76

Add thinking-level benchmarking support (off/minimal/low/medium/high/xhigh/adaptive)#76
ForceConstant wants to merge 7 commits intopinchbench:mainfrom
ForceConstant:continue-thinking-levels

ForceConstant commented Mar 25, 2026

Uh oh!

jb510 commented Mar 25, 2026

Uh oh!

olearycrew left a comment

Uh oh!

ForceConstant commented Apr 14, 2026

Uh oh!

ScuttleBot commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

ForceConstant commented Mar 25, 2026

Uh oh!

jb510 commented Mar 25, 2026

Uh oh!

olearycrew left a comment

Choose a reason for hiding this comment

Uh oh!

ForceConstant commented Apr 14, 2026

Uh oh!

ScuttleBot commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants