Skip to content

Lari dev#85

Open
LariTesserae wants to merge 3 commits into
anima-research:mainfrom
LariTesserae:lari-dev
Open

Lari dev#85
LariTesserae wants to merge 3 commits into
anima-research:mainfrom
LariTesserae:lari-dev

Conversation

@LariTesserae
Copy link
Copy Markdown
Contributor

  1. fixed cost estimation formula
  2. removed prompt caching markers, since Anthropic now has automatic prompt caching boundary detection, and it works better than our previous code
  3. added TTL choice (no prompt caching, 5 min, 1 hour) to chat settings, makes sense for expensive models and long context, depending on the chat dynamics

LariTesserae and others added 3 commits March 18, 2026 18:26
… metadata

The cost formula was fiction — showing $0.50 when Anthropic billed $3.00.

Root cause: a single CACHE_DISCOUNT = 0.9 constant treated all cache
interactions as 90% savings. In reality, Anthropic has 4 price tiers:
- Fresh input: 1x base
- Cache write (5min): 1.25x base
- Cache write (1h): 2x base
- Cache read: 0.1x base

Cache MISSES (writes) cost MORE than base, not less. The old formula
applied a discount to writes, inverting reality.

Changes:
- Replace CACHE_DISCOUNT with proper multipliers per tier
- calculateCostBreakdown now prices fresh/write/read/output separately
- Cost = sum of tiers (no more "total minus savings" subtraction game)
- cacheSavings is informational only, never subtracted from cost
- Add cacheTTL conversation setting: off / 5min / 1h
  - Off: no cache_control sent, all tokens at 1x base
  - 5min: cheap writes (1.25x), for regeneration bursts
  - 1h: expensive writes (2x), survives between turns
- TTL flows through entire chain: enhanced-inference → inference →
  anthropic.ts (system prompt, message-level, prefill breakpoints)
- Debug metadata tab in right-click menu showing API usage breakdown,
  cache hit/miss/off status, token counts per category
- Metadata captured for chat, regenerate, and edit paths

Verified: UI cost estimates now match Anthropic dashboard deltas.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Removed ~200 lines of manual cache management:
- 4-point cache marker arithmetic in context strategies
- Chapter II text breakpoint insertion (<|cache_breakpoint|>) in prefill
- splitAtCacheBreakpoints in anthropic.ts
- addCacheControlToMessages in enhanced-inference.ts
- _cacheControl / _hasCacheBreakpoints / _cacheTTL branch flags
- System prompt cache mirroring from first message
- Dead _cacheControl marker code in OpenRouter formatter

Replaced with top-level cache_control on the API request:
- Anthropic direct: cache_control in request params
- OpenRouter: cache_control in request body (replaces transforms: ['prompt-caching'])
- Both providers handle breakpoint placement automatically

Also fixed:
- OpenRouter double-counting on cache miss: prompt_tokens includes write tokens,
  but normalization only subtracted reads. Now subtracts both.
- cacheTTL now flows through to OpenRouter service
- Debug metadata added to continue path (was missing)

Tested: Anthropic direct + OpenRouter, cache hit/miss/off all correct.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented Mar 19, 2026

Greptile Summary

This PR makes three related changes to prompt caching: fixes the cost estimation formula to use correct per-TTL write multipliers (1.25x for 5 min, 2x for 1 h), removes manual per-message cache_control markers in favour of Anthropic's automatic placement via a top-level cache_control field, and adds a TTL selector (Off / 5 min / 1 h) to the group-chat Conversation Settings dialog. It also introduces a debugMetadata blob — persisted analogously to debugRequest/debugResponse — and surfaces it in a new "Metadata" tab in the debug dialog.

Key observations:

  • The cacheTTL field is added to ConversationSchema with a default of '5m', meaning automatic caching will be active for all conversations (including standard/one-on-one chats) unless explicitly turned off, but the UI control to change this is only exposed for group chats.
  • save() in ConversationSettingsDialog always emits cacheTTL regardless of conversation format, so standard-chat users are silently locked to the '5m' default with no way to opt out from the UI.
  • The JSDoc on calculateCostBreakdown still states "Arc uses 1h TTL everywhere" even though the default is now 5m.
  • enhancedCallback captures cacheTTL via closure but cacheTTL is declared ~100 lines after the callback definition — valid at runtime but fragile against future refactors.
  • debugMetadata follows the same pattern as pre-existing debug blobs: it is silently discarded (not stored inline as a fallback) if the blob save fails.

Confidence Score: 3/5

  • Mostly safe to merge but the silent caching opt-in for standard chats and the stale TTL comment warrant attention before shipping.
  • The core caching refactor is sound and the cost formula fix is correct. However, the cacheTTL setting defaulting to '5m' for all conversation types — including standard chats where there is no UI control — is an unintended side-effect that could silently change cost behaviour for users who have no visibility into this setting. The stale JSDoc and closure-ordering issues are lower priority but add maintainability risk.
  • ConversationSettingsDialog.vue (TTL control visibility vs. save payload), enhanced-inference.ts (stale comment, closure ordering of cacheTTL), and types.ts (schema default of '5m' affects all conversations).

Important Files Changed

Filename Overview
deprecated-claude-app/backend/src/services/enhanced-inference.ts Core cost calculation refactored to use per-TTL cache write multipliers (1.25x for 5 min, 2x for 1 h). Stale JSDoc comment still claims "Arc uses 1h TTL everywhere", and cacheTTL is captured by the enhancedCallback closure before it is declared, which is fragile.
deprecated-claude-app/frontend/src/components/ConversationSettingsDialog.vue New TTL dropdown added to the group-chat (prefill) settings panel, but cacheTTL is always included in the save payload for all conversation formats. Standard-chat users can't see or control the caching setting, yet the backend will apply the '5m' default to them silently.
deprecated-claude-app/backend/src/services/anthropic.ts Manual per-message cache_control markers removed; now uses a top-level cache_control field with TTL passed from the caller. Clean implementation of automatic prompt caching. Accepts cacheTTL param and maps '1h' to { type: 'ephemeral', ttl: '1h' } and '5m' to { type: 'ephemeral' }.
deprecated-claude-app/backend/src/services/openrouter.ts Same caching refactor as anthropic.ts; for Anthropic-provider requests the top-level cache_control is conditionally added based on cacheTTL. Non-Anthropic providers are unaffected. No new issues.
deprecated-claude-app/backend/src/database/index.ts New debugMetadata blob storage follows the same pattern as the pre-existing debugRequest/debugResponse handling. Minor: metadata is silently discarded on blob-save failure because delete updatesForMemory.debugMetadata is outside the try block.
deprecated-claude-app/backend/src/routes/conversations.ts Debug endpoint extended to load and return debugMetadata from blob storage or inline — mirrors the existing pattern for debugRequest/debugResponse. No issues.
deprecated-claude-app/frontend/src/components/DebugMessageDialog.vue New "Metadata" tab added to the debug dialog, displaying cache hit/miss status, provider, and full JSON of debugMetadata. Straightforward and well-structured change.
deprecated-claude-app/shared/src/types.ts cacheTTL: z.enum(['off', '5m', '1h']).default('5m').optional() added to ConversationSchema. Schema default of '5m' means caching is enabled by default for all conversations, including standard chats that have no UI control for this field.
deprecated-claude-app/backend/src/services/inference.ts cacheTTL parameter added to streamCompletion and correctly threaded through to provider-specific calls. No issues.
deprecated-claude-app/backend/src/websocket/handler.ts No direct changes to caching logic in the handler; TTL is read from the conversation object inside EnhancedInferenceService. File appears clean and unchanged relative to the PR scope.

Sequence Diagram

sequenceDiagram
    participant UI as ConversationSettingsDialog
    participant Conv as Conversation (DB)
    participant EI as EnhancedInferenceService
    participant IS as InferenceService
    participant Ant as AnthropicService / OpenRouterService

    UI->>Conv: save({ cacheTTL: 'off' | '5m' | '1h' })
    Conv-->>EI: conversation.cacheTTL

    EI->>EI: const cacheTTL = conversation.cacheTTL || '5m'
    EI->>EI: effectiveCacheTTL = cacheTTL !== 'off' ? cacheTTL : undefined

    EI->>IS: streamCompletion(..., effectiveCacheTTL)
    IS->>Ant: streamCompletion(..., cacheTTL)

    alt cacheTTL === '1h'
        Ant->>Ant: cache_control = { type:'ephemeral', ttl:'1h' } (2× write cost)
    else cacheTTL === '5m'
        Ant->>Ant: cache_control = { type:'ephemeral' } (1.25× write cost)
    else cacheTTL === undefined (off)
        Ant->>Ant: no cache_control field sent
    end

    Ant-->>EI: usage { freshInput, cacheWrite, cacheRead, output }
    EI->>EI: calculateCostBreakdown(model, tokens, cacheTTL)
    EI-->>UI: metrics { cost, cacheSavings, ... }
Loading

Comments Outside Diff (1)

  1. deprecated-claude-app/frontend/src/components/ConversationSettingsDialog.vue, line 1086-1092 (link)

    P1 cacheTTL is persisted for all conversation formats but the UI control only exists for group chats

    cacheTTL: cacheTTL.value is always included in the emitted update, regardless of whether settings.format is 'standard' or 'prefill'. The cacheTTL dropdown only renders in the v-else (prefill/group-chat) branch of the template, so users of standard chats have no visibility or control over the TTL setting. Because the schema default is '5m', standard chats will silently have automatic prompt caching enabled at 5-min TTL — and there's no UI to opt out.

    If caching is intentionally enabled for standard chats too, the control should also appear in the standard-chat settings panel (the v-if="settings.format === 'standard'" section). If standard chats should not use caching at all, the save() payload should conditionally omit or hard-code cacheTTL for 'standard' format:

    // Only propagate cacheTTL for prefill conversations (where the control is visible)
    ...(settings.value.format === 'prefill' && { cacheTTL: cacheTTL.value }),
    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: deprecated-claude-app/frontend/src/components/ConversationSettingsDialog.vue
    Line: 1086-1092
    
    Comment:
    **`cacheTTL` is persisted for all conversation formats but the UI control only exists for group chats**
    
    `cacheTTL: cacheTTL.value` is always included in the emitted update, regardless of whether `settings.format` is `'standard'` or `'prefill'`. The `cacheTTL` dropdown only renders in the `v-else` (prefill/group-chat) branch of the template, so users of standard chats have no visibility or control over the TTL setting. Because the schema default is `'5m'`, standard chats will silently have automatic prompt caching enabled at 5-min TTL — and there's no UI to opt out.
    
    If caching is intentionally enabled for standard chats too, the control should also appear in the standard-chat settings panel (the `v-if="settings.format === 'standard'"` section). If standard chats should not use caching at all, the `save()` payload should conditionally omit or hard-code `cacheTTL` for `'standard'` format:
    
    ```ts
    // Only propagate cacheTTL for prefill conversations (where the control is visible)
    ...(settings.value.format === 'prefill' && { cacheTTL: cacheTTL.value }),
    ```
    
    How can I resolve this? If you propose a fix, please make it concise.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: deprecated-claude-app/backend/src/services/enhanced-inference.ts
Line: 570-572

Comment:
**Stale JSDoc comment contradicts the new per-TTL multiplier logic**

The comment still says "Arc uses 1h TTL everywhere, so cache writes = 2x base", but this PR introduces configurable TTL (`5m` or `1h`). The new default is `5m` (1.25x multiplier), so the statement is now wrong and could mislead future developers into assuming 2x cache write costs always apply.

```suggestion
   * Calculate the actual cost of a request using Anthropic's tiered pricing.
   *
   * Anthropic bills four token categories at different rates:
   * - Fresh input (uncached): base input price
   * - Cache write: 1.25x base (5min TTL) or 2x base (1h TTL)
   * - Cache read: 0.1x base
   * - Output: output price
   *
   * Default TTL is 5min (1.25x write cost). Use 1h for slower interactive
   * sessions where the longer TTL benefit justifies the higher write price.
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: deprecated-claude-app/backend/src/services/enhanced-inference.ts
Line: 428-498

Comment:
**`cacheTTL` captured by closure before its declaration**

`enhancedCallback` (defined here at ~line 430) references `cacheTTL` at line 498, but `cacheTTL` is not declared until line 528 — roughly 100 lines later. This works at runtime because the callback is only invoked after the `await this.inferenceService.streamCompletion(...)` call at line 534 (which comes after the `const cacheTTL = ...` at line 528). However, the reverse ordering makes the code fragile and hard to reason about. A future refactor that calls or hoists `enhancedCallback` earlier would hit a `ReferenceError` in the temporal dead zone.

Consider declaring `cacheTTL` before `enhancedCallback`, or passing it as a parameter to the inner callback to make the dependency explicit.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: deprecated-claude-app/frontend/src/components/ConversationSettingsDialog.vue
Line: 1086-1092

Comment:
**`cacheTTL` is persisted for all conversation formats but the UI control only exists for group chats**

`cacheTTL: cacheTTL.value` is always included in the emitted update, regardless of whether `settings.format` is `'standard'` or `'prefill'`. The `cacheTTL` dropdown only renders in the `v-else` (prefill/group-chat) branch of the template, so users of standard chats have no visibility or control over the TTL setting. Because the schema default is `'5m'`, standard chats will silently have automatic prompt caching enabled at 5-min TTL — and there's no UI to opt out.

If caching is intentionally enabled for standard chats too, the control should also appear in the standard-chat settings panel (the `v-if="settings.format === 'standard'"` section). If standard chats should not use caching at all, the `save()` payload should conditionally omit or hard-code `cacheTTL` for `'standard'` format:

```ts
// Only propagate cacheTTL for prefill conversations (where the control is visible)
...(settings.value.format === 'prefill' && { cacheTTL: cacheTTL.value }),
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: deprecated-claude-app/backend/src/database/index.ts
Line: 9-17

Comment:
**`debugMetadata` is silently discarded when blob save fails**

`delete updatesForMemory.debugMetadata` runs outside the `try` block, so even when `blobStore.saveJsonBlob` throws (and `debugMetadataBlobId` is never assigned), the metadata is removed from the in-memory update. The result is that on a blob-storage failure the Debug → Metadata tab will show "No metadata available" with no indication that data was lost — the `console.warn` is the only signal.

This matches the pre-existing pattern used for `debugRequest`/`debugResponse`, so it's consistent, but it's worth making the tradeoff explicit. If you'd like the metadata to be preserved on failure (at the cost of memory pressure), move the `delete` inside the `try`:

```ts
if (updatesForMemory.debugMetadata) {
  try {
    const debugMetadataBlobId = await blobStore.saveJsonBlob(updatesForMemory.debugMetadata);
    updatesForMemory.debugMetadataBlobId = debugMetadataBlobId;
    delete updatesForMemory.debugMetadata; // Remove only after successful save
  } catch (err) {
    console.warn('[Database] Failed to save debugMetadata as blob:', err);
    // debugMetadata remains in the object; it will be stored inline (possibly large)
  }
}
```

How can I resolve this? If you propose a fix, please make it concise.

Last reviewed commit: "ui: simplify cache T..."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant