Skip to content

Compress evidence extraction to reduce token usage by 15-20% #18

@apenab

Description

@apenab

Problem

Evidence extraction currently pulls near-full page chunks (~2000-4000 chars), which inflates token usage.

Proposal

Compress evidence extraction to include only the relevant section plus minimal surrounding context (~800-1200 chars), while preserving traceable grounding.

Expected Impact

  • Estimated token reduction: ~15-20%
  • Example target: 2.2M -> 1.8M tokens (~-18%)

Risk

  • Moderate: missing key context may hurt grounding if extraction is too aggressive.

Implementation Considerations

  • Keep extraction deterministic and auditable (store source span references).
  • Include configurable context window around matched evidence.
  • Add fallback to larger context when confidence is low.

Acceptance Criteria

  • Evidence payload size is reduced to target range for most samples.
  • Grounding metrics do not regress beyond agreed tolerance.
  • Benchmark report includes before/after token use and quality.
  • Failure analysis documents any context-loss errors.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions