Skip to content

designs: inference cache (#012)#163

Open
jamesmt-aws wants to merge 1 commit intoellistarn:mainfrom
jamesmt-aws:inference-cache
Open

designs: inference cache (#012)#163
jamesmt-aws wants to merge 1 commit intoellistarn:mainfrom
jamesmt-aws:inference-cache

Conversation

@jamesmt-aws
Copy link
Copy Markdown
Contributor

Summary

Design doc and implementation for a disk-backed inference cache. The cache wraps inference.Client and stores responses keyed on the SHA-256 hash of the JSON-serialized call parameters (model, system prompt, messages, options). Cache hits return the stored response with no API call.

Motivated by observation strategy research: experimenting with different presentation strategies (windowed, owner-only, adaptive) means running the observe pipeline many times on the same conversations. Without caching, each run pays full API costs for calls whose inputs have not changed.

Design highlights

  • Key struct embeds ConverseOptions directly, so new fields are automatically in the key
  • Local filesystem or S3 backend (follows --bucket configuration)
  • Successful responses and truncations are cached; transient errors are not
  • Streaming calls (compose) are not cached
  • 1 GiB default cap with LRU eviction (design target, initial implementation has no cap)
  • --skip-cache to bypass reads and writes

Design doc at designs/012-inference-cache.md.

Test plan

  • go test ./... passes
  • Run muse compose --limit 2, note cost. Run again, verify near-zero observe cost
  • ls ~/.muse/cache/inference/ shows sharded entries

Design doc for a disk-backed inference cache that wraps inference.Client.
Cache key is SHA-256 of JSON-serialized call parameters (model, system
prompt, messages, options). Successful responses and truncations are
cached. Transient errors are not.

Motivated by observation strategy research: experimenting with different
strategies means running the observe pipeline many times on the same
conversations. The cache makes repeated calls free.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant