Skip to content

[Feature] Considering adding pruning thinking token #258

@Steve37vn

Description

@Steve37vn

Problem:
When using reasoning models (e.g., Gemini 3 Pro/Flash, Claude Opus 4.5 with extended thinking), thinking tokens consume context window space but provide no utility once the final response is generated.

Proposed Solution:

Detect reasoning blocks across providers:

  • OpenAI: reasoning field in response
  • Anthropic: tags
  • Other providers as needed

Dynamically prune thinking tokens from context before subsequent requests

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions