Skip to content

Add token-aware completion guidance for concise outputs #21

@apenab

Description

@apenab

Problem

Completion responses can be longer than needed, causing avoidable token spend.

Proposal

Introduce token-aware completion guidance:

  • Instruct responses to stay concise (e.g., max words/compact format).
  • Keep factual completeness and citation/grounding requirements.

Note: current RLM config already sets max_tokens=4096 per subcall; this change targets actual verbosity, not just hard caps.

Expected Impact

  • Estimated token reduction: ~5-10% (likely marginal in practice)

Risk

  • Low

Acceptance Criteria

  • Conciseness guidance is configurable and documented.
  • Measurable reduction in average completion length/tokens.
  • No meaningful regression in correctness/grounding.
  • Report includes token savings vs baseline.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions