Context
From code review of #21 (multi-spec comparison feature).
Current Behavior
The estimatePromptTokens function in internal/comparison/prompt.go:87-89 uses a simple heuristic of 4 characters per token:
func estimatePromptTokens(prompt string) int {
return len(prompt) / 4
}
Issue
This is a rough estimate. Claude's tokenizer typically uses ~3.5-4 chars/token for English prose, but code diffs may have different characteristics (more symbols, varied line lengths).
Suggested Improvement
Options:
- Use a more conservative estimate (e.g.,
len(prompt) / 3) for safety margin near the 150k token limit
- Use a proper tokenizer library for accurate estimates
- Add different heuristics for code vs prose content
Priority
Low - current implementation works but could fail on edge cases near the context limit.
Context
From code review of #21 (multi-spec comparison feature).
Current Behavior
The
estimatePromptTokensfunction ininternal/comparison/prompt.go:87-89uses a simple heuristic of 4 characters per token:Issue
This is a rough estimate. Claude's tokenizer typically uses ~3.5-4 chars/token for English prose, but code diffs may have different characteristics (more symbols, varied line lengths).
Suggested Improvement
Options:
len(prompt) / 3) for safety margin near the 150k token limitPriority
Low - current implementation works but could fail on edge cases near the context limit.