Lari dev by LariTesserae · Pull Request #85 · anima-research/animachat

LariTesserae · 2026-03-19T04:15:15Z

fixed cost estimation formula
removed prompt caching markers, since Anthropic now has automatic prompt caching boundary detection, and it works better than our previous code
added TTL choice (no prompt caching, 5 min, 1 hour) to chat settings, makes sense for expensive models and long context, depending on the chat dynamics

… metadata The cost formula was fiction — showing $0.50 when Anthropic billed $3.00. Root cause: a single CACHE_DISCOUNT = 0.9 constant treated all cache interactions as 90% savings. In reality, Anthropic has 4 price tiers: - Fresh input: 1x base - Cache write (5min): 1.25x base - Cache write (1h): 2x base - Cache read: 0.1x base Cache MISSES (writes) cost MORE than base, not less. The old formula applied a discount to writes, inverting reality. Changes: - Replace CACHE_DISCOUNT with proper multipliers per tier - calculateCostBreakdown now prices fresh/write/read/output separately - Cost = sum of tiers (no more "total minus savings" subtraction game) - cacheSavings is informational only, never subtracted from cost - Add cacheTTL conversation setting: off / 5min / 1h - Off: no cache_control sent, all tokens at 1x base - 5min: cheap writes (1.25x), for regeneration bursts - 1h: expensive writes (2x), survives between turns - TTL flows through entire chain: enhanced-inference → inference → anthropic.ts (system prompt, message-level, prefill breakpoints) - Debug metadata tab in right-click menu showing API usage breakdown, cache hit/miss/off status, token counts per category - Metadata captured for chat, regenerate, and edit paths Verified: UI cost estimates now match Anthropic dashboard deltas. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Removed ~200 lines of manual cache management: - 4-point cache marker arithmetic in context strategies - Chapter II text breakpoint insertion (<|cache_breakpoint|>) in prefill - splitAtCacheBreakpoints in anthropic.ts - addCacheControlToMessages in enhanced-inference.ts - _cacheControl / _hasCacheBreakpoints / _cacheTTL branch flags - System prompt cache mirroring from first message - Dead _cacheControl marker code in OpenRouter formatter Replaced with top-level cache_control on the API request: - Anthropic direct: cache_control in request params - OpenRouter: cache_control in request body (replaces transforms: ['prompt-caching']) - Both providers handle breakpoint placement automatically Also fixed: - OpenRouter double-counting on cache miss: prompt_tokens includes write tokens, but normalization only subtracted reads. Now subtracts both. - cacheTTL now flows through to OpenRouter service - Debug metadata added to continue path (was missing) Tested: Anthropic direct + OpenRouter, cache hit/miss/off all correct. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

greptile-apps · 2026-03-19T04:20:07Z

Greptile Summary

This PR makes three related changes to prompt caching: fixes the cost estimation formula to use correct per-TTL write multipliers (1.25x for 5 min, 2x for 1 h), removes manual per-message cache_control markers in favour of Anthropic's automatic placement via a top-level cache_control field, and adds a TTL selector (Off / 5 min / 1 h) to the group-chat Conversation Settings dialog. It also introduces a debugMetadata blob — persisted analogously to debugRequest/debugResponse — and surfaces it in a new "Metadata" tab in the debug dialog.

Key observations:

The cacheTTL field is added to ConversationSchema with a default of '5m', meaning automatic caching will be active for all conversations (including standard/one-on-one chats) unless explicitly turned off, but the UI control to change this is only exposed for group chats.
save() in ConversationSettingsDialog always emits cacheTTL regardless of conversation format, so standard-chat users are silently locked to the '5m' default with no way to opt out from the UI.
The JSDoc on calculateCostBreakdown still states "Arc uses 1h TTL everywhere" even though the default is now 5m.
enhancedCallback captures cacheTTL via closure but cacheTTL is declared ~100 lines after the callback definition — valid at runtime but fragile against future refactors.
debugMetadata follows the same pattern as pre-existing debug blobs: it is silently discarded (not stored inline as a fallback) if the blob save fails.

Confidence Score: 3/5

Mostly safe to merge but the silent caching opt-in for standard chats and the stale TTL comment warrant attention before shipping.
The core caching refactor is sound and the cost formula fix is correct. However, the cacheTTL setting defaulting to '5m' for all conversation types — including standard chats where there is no UI control — is an unintended side-effect that could silently change cost behaviour for users who have no visibility into this setting. The stale JSDoc and closure-ordering issues are lower priority but add maintainability risk.
ConversationSettingsDialog.vue (TTL control visibility vs. save payload), enhanced-inference.ts (stale comment, closure ordering of cacheTTL), and types.ts (schema default of '5m' affects all conversations).

Important Files Changed

Filename	Overview
deprecated-claude-app/backend/src/services/enhanced-inference.ts	Core cost calculation refactored to use per-TTL cache write multipliers (1.25x for 5 min, 2x for 1 h). Stale JSDoc comment still claims "Arc uses 1h TTL everywhere", and `cacheTTL` is captured by the `enhancedCallback` closure before it is declared, which is fragile.
deprecated-claude-app/frontend/src/components/ConversationSettingsDialog.vue	New TTL dropdown added to the group-chat (prefill) settings panel, but `cacheTTL` is always included in the save payload for all conversation formats. Standard-chat users can't see or control the caching setting, yet the backend will apply the `'5m'` default to them silently.
deprecated-claude-app/backend/src/services/anthropic.ts	Manual per-message `cache_control` markers removed; now uses a top-level `cache_control` field with TTL passed from the caller. Clean implementation of automatic prompt caching. Accepts `cacheTTL` param and maps `'1h'` to `{ type: 'ephemeral', ttl: '1h' }` and `'5m'` to `{ type: 'ephemeral' }`.
deprecated-claude-app/backend/src/services/openrouter.ts	Same caching refactor as anthropic.ts; for Anthropic-provider requests the top-level `cache_control` is conditionally added based on `cacheTTL`. Non-Anthropic providers are unaffected. No new issues.
deprecated-claude-app/backend/src/database/index.ts	New `debugMetadata` blob storage follows the same pattern as the pre-existing `debugRequest`/`debugResponse` handling. Minor: metadata is silently discarded on blob-save failure because `delete updatesForMemory.debugMetadata` is outside the try block.
deprecated-claude-app/backend/src/routes/conversations.ts	Debug endpoint extended to load and return `debugMetadata` from blob storage or inline — mirrors the existing pattern for `debugRequest`/`debugResponse`. No issues.
deprecated-claude-app/frontend/src/components/DebugMessageDialog.vue	New "Metadata" tab added to the debug dialog, displaying cache hit/miss status, provider, and full JSON of `debugMetadata`. Straightforward and well-structured change.
deprecated-claude-app/shared/src/types.ts	`cacheTTL: z.enum(['off', '5m', '1h']).default('5m').optional()` added to `ConversationSchema`. Schema default of `'5m'` means caching is enabled by default for all conversations, including standard chats that have no UI control for this field.
deprecated-claude-app/backend/src/services/inference.ts	`cacheTTL` parameter added to `streamCompletion` and correctly threaded through to provider-specific calls. No issues.
deprecated-claude-app/backend/src/websocket/handler.ts	No direct changes to caching logic in the handler; TTL is read from the conversation object inside `EnhancedInferenceService`. File appears clean and unchanged relative to the PR scope.

Sequence Diagram

sequenceDiagram
    participant UI as ConversationSettingsDialog
    participant Conv as Conversation (DB)
    participant EI as EnhancedInferenceService
    participant IS as InferenceService
    participant Ant as AnthropicService / OpenRouterService

    UI->>Conv: save({ cacheTTL: 'off' | '5m' | '1h' })
    Conv-->>EI: conversation.cacheTTL

    EI->>EI: const cacheTTL = conversation.cacheTTL || '5m'
    EI->>EI: effectiveCacheTTL = cacheTTL !== 'off' ? cacheTTL : undefined

    EI->>IS: streamCompletion(..., effectiveCacheTTL)
    IS->>Ant: streamCompletion(..., cacheTTL)

    alt cacheTTL === '1h'
        Ant->>Ant: cache_control = { type:'ephemeral', ttl:'1h' } (2× write cost)
    else cacheTTL === '5m'
        Ant->>Ant: cache_control = { type:'ephemeral' } (1.25× write cost)
    else cacheTTL === undefined (off)
        Ant->>Ant: no cache_control field sent
    end

    Ant-->>EI: usage { freshInput, cacheWrite, cacheRead, output }
    EI->>EI: calculateCostBreakdown(model, tokens, cacheTTL)
    EI-->>UI: metrics { cost, cacheSavings, ... }

Comments Outside Diff (1)

deprecated-claude-app/frontend/src/components/ConversationSettingsDialog.vue, line 1086-1092 (link)

cacheTTL is persisted for all conversation formats but the UI control only exists for group chats

cacheTTL: cacheTTL.value is always included in the emitted update, regardless of whether settings.format is 'standard' or 'prefill'. The cacheTTL dropdown only renders in the v-else (prefill/group-chat) branch of the template, so users of standard chats have no visibility or control over the TTL setting. Because the schema default is '5m', standard chats will silently have automatic prompt caching enabled at 5-min TTL — and there's no UI to opt out.

If caching is intentionally enabled for standard chats too, the control should also appear in the standard-chat settings panel (the v-if="settings.format === 'standard'" section). If standard chats should not use caching at all, the save() payload should conditionally omit or hard-code cacheTTL for 'standard' format:

// Only propagate cacheTTL for prefill conversations (where the control is visible)
...(settings.value.format === 'prefill' && { cacheTTL: cacheTTL.value }),

Prompt To Fix With AI

This is a comment left during a code review.
Path: deprecated-claude-app/frontend/src/components/ConversationSettingsDialog.vue
Line: 1086-1092

Comment:
**`cacheTTL` is persisted for all conversation formats but the UI control only exists for group chats**

`cacheTTL: cacheTTL.value` is always included in the emitted update, regardless of whether `settings.format` is `'standard'` or `'prefill'`. The `cacheTTL` dropdown only renders in the `v-else` (prefill/group-chat) branch of the template, so users of standard chats have no visibility or control over the TTL setting. Because the schema default is `'5m'`, standard chats will silently have automatic prompt caching enabled at 5-min TTL — and there's no UI to opt out.

If caching is intentionally enabled for standard chats too, the control should also appear in the standard-chat settings panel (the `v-if="settings.format === 'standard'"` section). If standard chats should not use caching at all, the `save()` payload should conditionally omit or hard-code `cacheTTL` for `'standard'` format:

```ts
// Only propagate cacheTTL for prefill conversations (where the control is visible)
...(settings.value.format === 'prefill' && { cacheTTL: cacheTTL.value }),
```

How can I resolve this? If you propose a fix, please make it concise.

Prompt To Fix All With AI

This is a comment left during a code review.
Path: deprecated-claude-app/backend/src/services/enhanced-inference.ts
Line: 570-572

Comment:
**Stale JSDoc comment contradicts the new per-TTL multiplier logic**

The comment still says "Arc uses 1h TTL everywhere, so cache writes = 2x base", but this PR introduces configurable TTL (`5m` or `1h`). The new default is `5m` (1.25x multiplier), so the statement is now wrong and could mislead future developers into assuming 2x cache write costs always apply.

```suggestion
   * Calculate the actual cost of a request using Anthropic's tiered pricing.
   *
   * Anthropic bills four token categories at different rates:
   * - Fresh input (uncached): base input price
   * - Cache write: 1.25x base (5min TTL) or 2x base (1h TTL)
   * - Cache read: 0.1x base
   * - Output: output price
   *
   * Default TTL is 5min (1.25x write cost). Use 1h for slower interactive
   * sessions where the longer TTL benefit justifies the higher write price.
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: deprecated-claude-app/backend/src/services/enhanced-inference.ts
Line: 428-498

Comment:
**`cacheTTL` captured by closure before its declaration**

`enhancedCallback` (defined here at ~line 430) references `cacheTTL` at line 498, but `cacheTTL` is not declared until line 528 — roughly 100 lines later. This works at runtime because the callback is only invoked after the `await this.inferenceService.streamCompletion(...)` call at line 534 (which comes after the `const cacheTTL = ...` at line 528). However, the reverse ordering makes the code fragile and hard to reason about. A future refactor that calls or hoists `enhancedCallback` earlier would hit a `ReferenceError` in the temporal dead zone.

Consider declaring `cacheTTL` before `enhancedCallback`, or passing it as a parameter to the inner callback to make the dependency explicit.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: deprecated-claude-app/frontend/src/components/ConversationSettingsDialog.vue
Line: 1086-1092

Comment:
**`cacheTTL` is persisted for all conversation formats but the UI control only exists for group chats**

`cacheTTL: cacheTTL.value` is always included in the emitted update, regardless of whether `settings.format` is `'standard'` or `'prefill'`. The `cacheTTL` dropdown only renders in the `v-else` (prefill/group-chat) branch of the template, so users of standard chats have no visibility or control over the TTL setting. Because the schema default is `'5m'`, standard chats will silently have automatic prompt caching enabled at 5-min TTL — and there's no UI to opt out.

If caching is intentionally enabled for standard chats too, the control should also appear in the standard-chat settings panel (the `v-if="settings.format === 'standard'"` section). If standard chats should not use caching at all, the `save()` payload should conditionally omit or hard-code `cacheTTL` for `'standard'` format:

```ts
// Only propagate cacheTTL for prefill conversations (where the control is visible)
...(settings.value.format === 'prefill' && { cacheTTL: cacheTTL.value }),
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: deprecated-claude-app/backend/src/database/index.ts
Line: 9-17

Comment:
**`debugMetadata` is silently discarded when blob save fails**

`delete updatesForMemory.debugMetadata` runs outside the `try` block, so even when `blobStore.saveJsonBlob` throws (and `debugMetadataBlobId` is never assigned), the metadata is removed from the in-memory update. The result is that on a blob-storage failure the Debug → Metadata tab will show "No metadata available" with no indication that data was lost — the `console.warn` is the only signal.

This matches the pre-existing pattern used for `debugRequest`/`debugResponse`, so it's consistent, but it's worth making the tradeoff explicit. If you'd like the metadata to be preserved on failure (at the cost of memory pressure), move the `delete` inside the `try`:

```ts
if (updatesForMemory.debugMetadata) {
  try {
    const debugMetadataBlobId = await blobStore.saveJsonBlob(updatesForMemory.debugMetadata);
    updatesForMemory.debugMetadataBlobId = debugMetadataBlobId;
    delete updatesForMemory.debugMetadata; // Remove only after successful save
  } catch (err) {
    console.warn('[Database] Failed to save debugMetadata as blob:', err);
    // debugMetadata remains in the object; it will be stored inline (possibly large)
  }
}
```

How can I resolve this? If you propose a fix, please make it concise.

_{Last reviewed commit: "ui: simplify cache T..."}

LariTesserae and others added 3 commits March 18, 2026 18:26

ui: simplify cache TTL option labels

f717f6c

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lari dev#85

Lari dev#85
LariTesserae wants to merge 3 commits into
anima-research:mainfrom
LariTesserae:lari-dev

LariTesserae commented Mar 19, 2026

Uh oh!

greptile-apps Bot commented Mar 19, 2026 •

edited

Loading

Important Files Changed

Comments Outside Diff (1)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

LariTesserae commented Mar 19, 2026

Uh oh!

greptile-apps Bot commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Sequence Diagram

Comments Outside Diff (1)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

greptile-apps Bot commented Mar 19, 2026 •

edited

Loading