designs: inference cache (#012) by jamesmt-aws · Pull Request #163 · ellistarn/muse

jamesmt-aws · 2026-04-14T06:34:34Z

Summary

Design doc and implementation for a disk-backed inference cache. The cache wraps inference.Client and stores responses keyed on the SHA-256 hash of the JSON-serialized call parameters (model, system prompt, messages, options). Cache hits return the stored response with no API call.

Motivated by observation strategy research: experimenting with different presentation strategies (windowed, owner-only, adaptive) means running the observe pipeline many times on the same conversations. Without caching, each run pays full API costs for calls whose inputs have not changed.

Design highlights

Key struct embeds ConverseOptions directly, so new fields are automatically in the key
Local filesystem or S3 backend (follows --bucket configuration)
Successful responses and truncations are cached; transient errors are not
Streaming calls (compose) are not cached
1 GiB default cap with LRU eviction (design target, initial implementation has no cap)
--skip-cache to bypass reads and writes

Design doc at designs/012-inference-cache.md.

Test plan

go test ./... passes
Run muse compose --limit 2, note cost. Run again, verify near-zero observe cost
ls ~/.muse/cache/inference/ shows sharded entries

Design doc for a disk-backed inference cache that wraps inference.Client. Cache key is SHA-256 of JSON-serialized call parameters (model, system prompt, messages, options). Successful responses and truncations are cached. Transient errors are not. Motivated by observation strategy research: experimenting with different strategies means running the observe pipeline many times on the same conversations. The cache makes repeated calls free.

jamesmt-aws force-pushed the inference-cache branch from 6c854a4 to 031f43b Compare April 14, 2026 07:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

designs: inference cache (#012)#163

designs: inference cache (#012)#163
jamesmt-aws wants to merge 1 commit intoellistarn:mainfrom
jamesmt-aws:inference-cache

jamesmt-aws commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jamesmt-aws commented Apr 14, 2026

Summary

Design highlights

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant