Add token-aware completion guidance for concise outputs

## Problem
Completion responses can be longer than needed, causing avoidable token spend.

## Proposal
Introduce token-aware completion guidance:
- Instruct responses to stay concise (e.g., max words/compact format).
- Keep factual completeness and citation/grounding requirements.

Note: current RLM config already sets `max_tokens=4096` per subcall; this change targets actual verbosity, not just hard caps.

## Expected Impact
- Estimated token reduction: **~5-10%** (likely marginal in practice)

## Risk
- **Low**

## Acceptance Criteria
- Conciseness guidance is configurable and documented.
- Measurable reduction in average completion length/tokens.
- No meaningful regression in correctness/grounding.
- Report includes token savings vs baseline.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add token-aware completion guidance for concise outputs #21

Problem

Proposal

Expected Impact

Risk

Acceptance Criteria

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Add token-aware completion guidance for concise outputs #21

Description

Problem

Proposal

Expected Impact

Risk

Acceptance Criteria

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions