refactor: split caching_unified.py into modular caching/ package#16
refactor: split caching_unified.py into modular caching/ package#16dr-gareth-roberts wants to merge 1 commit intomainfrom
Conversation
Split the 3,488-line caching_unified.py into a well-organized caching/ package: New module structure: - caching/types.py: Enums (CacheStrategy, CacheStatus, CacheScope), dataclasses (CacheConfig, CacheEntry, CacheStats, CacheLookupResult), and CacheEntryMixin - caching/base.py: BaseCacheABC abstract base class and key generation functions - caching/backends/memory.py: InMemoryCache implementation - caching/backends/disk.py: DiskCache (SQLite-backed) implementation - caching/backends/strategy.py: StrategyCache with configurable eviction strategies - caching/prompt_cache.py: PromptCache for LLM response caching - caching/model_wrapper.py: CachedModel wrapper for transparent caching - caching/utilities.py: CacheWarmer, MemoizedFunction, memoize decorator - caching/namespace.py: CacheNamespace for organizing multiple caches - caching/deduplicator.py: ResponseDeduplicator for identifying duplicate responses - caching/async_adapter.py: AsyncCacheAdapter for async/await compatibility - caching/factory.py: Convenience functions (create_cache, create_prompt_cache, etc.), global cache management, and cached decorator Backward compatibility: - caching_unified.py is now a re-export shim that imports and re-exports all 40+ public symbols - All existing imports continue to work unchanged - New code can import from insideLLMs.caching directly Benefits: - Improved code organization and maintainability - Easier navigation and understanding of caching functionality - Clear separation of concerns between different cache types - Reduced cognitive load when working with specific cache features
|
|
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
There was a problem hiding this comment.
Sorry @dr-gareth-roberts, your pull request is larger than the review limit of 150000 diff characters
📝 WalkthroughWalkthroughIntroduces a comprehensive new caching subsystem for insideLLMs with multiple cache backends (in-memory, disk-based, strategy-driven), factory functions for cache creation and management, specialized features like prompt caching and response deduplication, and utilities for cache warming and function memoization, all unified through a single package initialization. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~70 minutes Possibly related issues
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
| self._stats.evictions += cursor.rowcount | ||
|
|
||
| conn.commit() | ||
| conn.execute("VACUUM") |
There was a problem hiding this comment.
CRITICAL: Performance issue - VACUUM is extremely expensive.
Running VACUUM blocks the database and copies the entire file. Doing this during cache eviction (which happens during set) will cause massive latency spikes, especially for large caches. Consider using PRAGMA auto_vacuum = INCREMENTAL or removing this automatic vacuum.
| def import_from_file(self, path: Union[str, Path]) -> int: | ||
| """Import cache from a JSON file.""" | ||
| with open(path) as f: | ||
| entries = json.load(f) |
There was a problem hiding this comment.
CRITICAL: Memory safety - json.load reads entire file into memory.
For a disk cache designed to handle data larger than memory, loading the entire export file into RAM will cause OOM crashes. Use a streaming parser or a line-based format (JSONL) for exports.
| key_parts.append(f"model:{model}") | ||
|
|
||
| if params: | ||
| sorted_params = json.dumps(params, sort_keys=True) |
There was a problem hiding this comment.
WARNING: Runtime error - json.dumps fails on non-serializable objects.
If params contains any non-JSON-serializable types (sets, custom objects, etc.), this will raise a TypeError. Use default=str to handle arbitrary objects safely.
| sorted_params = json.dumps(params, sort_keys=True) | |
| sorted_params = json.dumps(params, sort_keys=True, default=str) |
| for k, v in sorted(kwargs.items()): | ||
| key_parts[k] = v | ||
|
|
||
| key_str = json.dumps(key_parts, sort_keys=True, ensure_ascii=True) |
There was a problem hiding this comment.
WARNING: Runtime error - json.dumps fails on non-serializable objects.
Same issue as above. kwargs values are included in key_parts and serialized.
| key_str = json.dumps(key_parts, sort_keys=True, ensure_ascii=True) | |
| key_str = json.dumps(key_parts, sort_keys=True, ensure_ascii=True, default=str) |
Insecure Use of Crypto (4)
More info on how to fix Insecure Use of Crypto in Python. 👉 Go to the dashboard for detailed results. 📥 Happy? Share your feedback with us. |
Code Review SummaryStatus: 4 Issues Found | Recommendation: Address before merge Overview
Issue Details (click to expand)CRITICAL
WARNING
Other Observations (not in diff)Issues found in unchanged code or general observations:
Files Reviewed (15 files)
|
| if name not in self._caches or not isinstance(self._caches[name], PromptCache): | ||
| self._caches[name] = PromptCache(config or self.default_config) |
There was a problem hiding this comment.
🔴 CacheNamespace.get_prompt_cache silently replaces existing non-PromptCache, causing data loss
The get_prompt_cache method silently replaces an existing cache with a new PromptCache if the existing cache is not a PromptCache, destroying all cached data.
Click to expand
How the bug is triggered
If a user first creates a regular cache with get_cache("foo"), populates it with data, and then later calls get_prompt_cache("foo"), the existing cache is silently replaced:
namespace = CacheNamespace()
cache = namespace.get_cache("shared")
cache.set("important_data", "value") # Data stored
# Later in code, perhaps in a different module:
prompt_cache = namespace.get_prompt_cache("shared") # Silently replaces!
# All data in the original cache is now LOSTThe condition at line 144 if name not in self._caches or not isinstance(self._caches[name], PromptCache) causes the replacement when the cache exists but isn't a PromptCache.
Actual vs Expected
- Actual: Existing cache is silently replaced with a new empty
PromptCache - Expected: Either raise an error indicating type mismatch, or return the existing cache if compatible, or at minimum log a warning
Impact
Silent data loss in multi-module applications where different parts of the codebase might access the same named cache with different methods.
Recommendation: Either raise a ValueError when trying to get a PromptCache for a name that already has a non-PromptCache, or only check if name not in self._caches (creating new caches but returning existing ones regardless of type if PromptCache is a subclass of StrategyCache).
Was this helpful? React with 👍 or 👎 to provide feedback.
❌ 1 Tests Failed:
View the top 1 failed test(s) by shortest run time
To view more test analytics, go to the Test Analytics Dashboard |
There was a problem hiding this comment.
Pull request overview
This refactor splits the previous 3,400+ line caching_unified.py into a modular insideLLMs.caching package, and turns caching_unified.py into a backward-compatibility shim that re-exports all public caching APIs from the new package.
Changes:
- Introduces a new
insideLLMs.cachingpackage with focused modules for types, backends, utilities, prompt caching, async adapter, factory functions, namespaces, deduplication, and a model wrapper. - Replaces the implementation in
caching_unified.pywith a thin re-export shim that forwards all public symbols to the new package. - Aligns
insideLLMs.caching.__all__andinsideLLMs.caching_unified.__all__so existing imports continue to work while new code can use the cleanerinsideLLMs.cachingpackage.
Reviewed changes
Copilot reviewed 14 out of 15 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
insideLLMs/caching_unified.py |
Replaces the monolithic implementation with a compatibility shim that re-exports all caching symbols from the new insideLLMs.caching package. |
insideLLMs/caching/__init__.py |
Defines the new primary caching entry point, aggregating and exporting all public caching APIs (types, backends, utilities, factory functions, async adapter, etc.). |
insideLLMs/caching/base.py |
Extracts BaseCacheABC and key generation utilities (generate_cache_key, generate_model_cache_key) into a dedicated base module. |
insideLLMs/caching/types.py |
Houses all core enums and dataclasses (CacheStrategy, CacheStatus, CacheScope, CacheConfig, CacheEntryMixin, CacheEntry, CacheStats, CacheLookupResult) used across the caching system. |
insideLLMs/caching/backends/memory.py |
Provides the InMemoryCache backend (LRU + TTL), now as a dedicated backend module implementing BaseCacheABC. |
insideLLMs/caching/backends/disk.py |
Provides the DiskCache backend (SQLite, size-managed, TTL, export/import), separated into its own backend module. |
insideLLMs/caching/backends/strategy.py |
Implements StrategyCache (and BaseCache alias) using the new shared types module; note: its __init__ signature has been simplified and currently breaks the previous strategy/max_size keyword API. |
insideLLMs/caching/backends/__init__.py |
Re-exports InMemoryCache, DiskCache, and StrategyCache/BaseCache from the backend submodules as a convenience. |
insideLLMs/caching/prompt_cache.py |
Extracts PromptCache into its own module, still extending StrategyCache and using semantic similarity via word_overlap_similarity. |
insideLLMs/caching/namespace.py |
Extracts CacheNamespace into a separate module for managing multiple named caches backed by StrategyCache/PromptCache. |
insideLLMs/caching/model_wrapper.py |
Extracts CachedModel, providing a thin caching wrapper around models with deterministic-only caching by default. |
insideLLMs/caching/utilities.py |
Extracts CacheWarmer, MemoizedFunction, and memoize into a utilities module, with BaseCache aliased to StrategyCache. |
insideLLMs/caching/factory.py |
Centralizes factory and convenience functions (create_cache, create_prompt_cache, create_cache_warmer, create_namespace, get_cache_key, cached_response, global default cache helpers, and cached decorator). |
insideLLMs/caching/deduplicator.py |
Extracts ResponseDeduplicator into its own module with the same API as before (add, get_unique_responses, get_duplicate_count, clear). |
insideLLMs/caching/async_adapter.py |
Extracts AsyncCacheAdapter into a separate module, wrapping StrategyCache (BaseCache alias) with async get/set/delete/clear methods. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def __init__(self, config: Optional[CacheConfig] = None): | ||
| self.config = config or CacheConfig() |
There was a problem hiding this comment.
The new StrategyCache.__init__ signature only accepts an optional CacheConfig parameter, but the previous implementation also supported strategy and max_size keyword overrides (e.g., StrategyCache(strategy=CacheStrategy.LRU, max_size=100)), which are still used in the test suite and likely by external callers. This is a breaking API change that will cause TypeError at call sites expecting the old signature. To preserve backward compatibility and keep the refactor non-functional, please reintroduce the strategy and max_size optional keyword parameters and apply them on top of the provided/configured CacheConfig as before.
| def __init__(self, config: Optional[CacheConfig] = None): | |
| self.config = config or CacheConfig() | |
| def __init__( | |
| self, | |
| config: Optional[CacheConfig] = None, | |
| strategy: Optional[CacheStrategy] = None, | |
| max_size: Optional[int] = None, | |
| ): | |
| """ | |
| Initialize the strategy cache. | |
| Parameters | |
| ---------- | |
| config : | |
| Optional CacheConfig instance. If not provided, a default | |
| CacheConfig is created. | |
| strategy : | |
| Optional override for the cache strategy. If provided, this | |
| value takes precedence over the strategy defined in ``config``. | |
| max_size : | |
| Optional override for the maximum cache size. If provided, this | |
| value takes precedence over the max_size defined in ``config``. | |
| """ | |
| self.config = config or CacheConfig() | |
| # Backward-compatible keyword overrides | |
| if strategy is not None: | |
| self.config.strategy = strategy | |
| if max_size is not None: | |
| self.config.max_size = max_size |
There was a problem hiding this comment.
Actionable comments posted: 10
🤖 Fix all issues with AI agents
In `@insideLLMs/caching/backends/disk.py`:
- Around line 274-277: Replace the non-deterministic json.dump call that writes
entries with one that produces canonical, stable output: when writing in the
block using open(path, "w") and json.dump(entries, f, indent=2), add
sort_keys=True and a stable separators argument (e.g., separators=(',', ': '))
so keys are consistently ordered and spacing is canonical across exports; keep
indent for readability if desired and preserve the same file write path and
context.
- Around line 160-164: The TTL-expiry branch in the disk cache (where it checks
row["expires_at"] and deletes the row via conn.execute("DELETE FROM cache WHERE
key = ?", (key,))) fails to increment the expiration metric; update that branch
to also increment self._stats.expirations (similarly to InMemoryCache) before
returning None so expired disk entries are counted; locate this logic in the
disk backend method that reads rows (uses conn, row, key) and add
self._stats.expirations += 1 right after conn.commit() and before returning.
In `@insideLLMs/caching/backends/memory.py`:
- Around line 119-123: In the expiration branch inside the memory cache get
logic (the block that checks entry.get("expires_at") and compares time.time() >
entry["expires_at"]), increment the CacheStats expirations counter in addition
to the existing misses increment: update self._stats.expirations (and keep
self._stats.misses) before deleting from self._cache so TTL expirations are
tracked; locate this logic around the expiration check that references entry,
self._cache, and self._stats to apply the change.
In `@insideLLMs/caching/base.py`:
- Around line 55-57: The current base method has() calls get(), causing
hits/misses to increment; change has() in the base class
(insideLLMs.caching.base.Cache) to be abstract (raise NotImplementedError or use
`@abstractmethod`) so concrete caches implement a stats-free existence check, and
update InMemoryCache.has, DiskCache.has and StrategyCache.has to perform a
non-mutating existence test (e.g., check the internal dict, key presence on
disk, or delegate to underlying caches) without touching hit/miss counters or
calling get().
In `@insideLLMs/caching/deduplicator.py`:
- Around line 104-107: ResponseDeduplicator is not thread-safe: add a
threading.RLock to the class (import threading and set self._lock =
threading.RLock() in __init__), and wrap any access/modification of
self._responses and self._duplicate_count (e.g., inside the add() method and any
getters/readers) with self._lock (use with self._lock: ...). This ensures all
mutations and reads of _responses and _duplicate_count in class
ResponseDeduplicator are synchronized.
In `@insideLLMs/caching/factory.py`:
- Around line 422-424: The cached decorator currently treats a returned None as
a cache miss by using "cached_result = _cache.get(key)" and "if cached_result is
not None", so change the lookup to distinguish absent keys (e.g. use a unique
sentinel object or check key membership with "if key in _cache") and return the
cached value when the key exists even if it is None; update the logic around
_cache.get(key) / cached_result in the cached decorator to use that sentinel or
membership check so legitimate None results are stored and reused.
In `@insideLLMs/caching/namespace.py`:
- Around line 137-146: The get_prompt_cache method currently overwrites any
existing entry in self._caches when the stored value is not a PromptCache,
risking silent data loss; change it to detect a type mismatch and raise a clear
error instead of replacing: inside get_prompt_cache (while holding self._lock)
if name exists in self._caches and not isinstance(self._caches[name],
PromptCache) raise a TypeError (or custom exception) explaining the name/type
conflict and suggesting using get_cache or deleting the existing cache; keep the
current creation path (self._caches[name] = PromptCache(config or
self.default_config)) only when the name is absent, and mirror analogous
behavior in get_cache if needed so both getters are symmetric.
In `@insideLLMs/caching/prompt_cache.py`:
- Around line 124-148: The _prompt_keys mapping in cache_response stores
prompt_hash -> cache key but is never cleaned on eviction; update the cache to
remove stale mappings either by (A) hooking into the cache eviction/expire path
(or the set/evict function) to delete any _prompt_keys entries whose value
equals the evicted key, or (B) making get_by_prompt defensive: compute
prompt_hash, look up key in _prompt_keys, then verify the key exists in the
cache via the cache lookup (e.g., call self.get(key) or self._has(key)); if the
key is missing, remove the prompt_hash from self._prompt_keys and return a cache
miss. Modify methods named cache_response, get_by_prompt, and the cache
eviction/eject path (or set/evict helpers) to implement one of these cleanup
strategies so _prompt_keys cannot grow with stale entries.
In `@insideLLMs/caching/types.py`:
- Around line 141-158: The docstring iteration example for CacheStatus is
inconsistent with the enum because CacheStatus also defines backward-compatible
values HIT and MISS; update the docstring for CacheStatus to either include
'hit' and 'miss' in the printed iteration example or add a note that HIT and
MISS are present for backward compatibility and will appear when iterating.
Locate the CacheStatus enum and its members (HIT, MISS, ACTIVE, EXPIRED,
EVICTED, WARMING) and adjust the example text accordingly so the docstring
matches the actual enum contents.
In `@insideLLMs/caching/utilities.py`:
- Around line 337-342: MemoizedFunction assumes cache.get(key) returns an object
with .hit and .value; update MemoizedFunction to handle both return types by
checking the result from self.cache.get(key): if it has attributes .hit/.value
(as returned by StrategyCache/CacheLookupResult) use them, otherwise treat a
truthy/non-None return as a hit and the return value as the cached value; adjust
the increment of self._cache_calls and the return accordingly. Reference
symbols: MemoizedFunction, key_generator, self.cache.get, _cache_calls,
CacheLookupResult, StrategyCache, InMemoryCache.get, BaseCacheABC.
🧹 Nitpick comments (10)
insideLLMs/caching/types.py (1)
31-40: Consider using timezone-aware datetimes for consistency.
datetime.now()returns naive (timezone-unaware) datetimes. In distributed systems or when serializing/deserializing across timezones, this can cause subtle bugs. Consider usingdatetime.now(timezone.utc)ordatetime.utcnow()for consistency.This affects both
is_expired()andtouch()methods.insideLLMs/caching/base.py (1)
151-154: Non-primitive kwargs values may produce inconsistent cache keys.The string interpolation
f"{k}:{v}"for kwargs relies on__str__which may be inconsistent for complex objects (e.g., dicts, lists). Consider JSON-serializing non-primitive values for determinism.♻️ Proposed fix for deterministic kwargs handling
# Add any additional kwargs (sorted for determinism) for k, v in sorted(kwargs.items()): - key_parts.append(f"{k}:{v}") + if isinstance(v, (dict, list)): + key_parts.append(f"{k}:{json.dumps(v, sort_keys=True)}") + else: + key_parts.append(f"{k}:{v}")insideLLMs/caching/backends/strategy.py (1)
259-300: Eviction implementations for LFU, TTL, and SIZE have O(n) complexity.The
_evict_one()method iterates all entries for LFU (min access_count), TTL (soonest expiration), and SIZE (max size_bytes) strategies. For large caches, this could be a performance bottleneck.This is acceptable for initial implementation, but consider using heap data structures for O(log n) eviction if performance becomes critical.
insideLLMs/caching/deduplicator.py (1)
109-122: O(n) duplicate detection may be slow for large datasets.Each
add()scans all existing responses linearly. For use cases with thousands of responses, consider using a hash-based approach for exact matches (threshold=1.0) as an optimization.insideLLMs/caching/backends/disk.py (1)
230-257: VACUUM after every eviction batch may degrade performance.
VACUUMrewrites the entire database and can be slow for large caches. Running it in the eviction loop (which could execute multiple times in oneset()call) may cause noticeable latency spikes.Consider:
- Running VACUUM only when significant space can be reclaimed
- Running VACUUM in a background thread
- Using
PRAGMA auto_vacuum = INCREMENTALinsteadinsideLLMs/caching/utilities.py (2)
155-201: Batch removal includes all items regardless of actual processing status.The
warm()method removesbatch_sizeitems from the queue (line 201) even though some items may have been skipped due toskip_existing=True. This means if you callwarm(batch_size=10, skip_existing=True)and 5 items are skipped, you still lose all 10 items from the queue but only effectively warmed 5.This might be intentional (treating "skipped" as "handled"), but it could be surprising if users expect only successfully-warmed items to be removed.
💡 Alternative: Only remove actually processed items
- batch = self._warming_queue[:batch_size] + batch = self._warming_queue[:batch_size] + processed_count = 0 for item in batch: # ... processing logic ... + processed_count += 1 - self._warming_queue = self._warming_queue[batch_size:] + self._warming_queue = self._warming_queue[processed_count:]
331-331: Non-standard usage offunctools.wraps.
wraps(func)(self)is an unusual pattern.wrapsreturns a decorator that updates__name__,__doc__, etc., but calling it onselfdoesn't properly assign these attributes to the instance in a way that makesselfcallable with the expected metadata.Consider using a more explicit approach:
♻️ Proposed fix for wraps usage
- wraps(func)(self) + self.__name__ = func.__name__ + self.__doc__ = func.__doc__ + self.__wrapped__ = funcinsideLLMs/caching/factory.py (1)
376-398: Global default cache management lacks thread-safety.The
_default_cacheglobal and its accessors (get_default_cache,set_default_cache,clear_default_cache) are not thread-safe. Concurrent calls toget_default_cache()could result in multipleInMemoryCacheinstances being created due to a race condition.🔒 Proposed fix using a lock
+import threading + +_default_cache_lock = threading.Lock() _default_cache: Optional[BaseCacheABC] = None def get_default_cache() -> BaseCacheABC: """Get the global default cache instance.""" global _default_cache - if _default_cache is None: - _default_cache = InMemoryCache() + if _default_cache is None: + with _default_cache_lock: + if _default_cache is None: # Double-check locking + _default_cache = InMemoryCache() return _default_cache def set_default_cache(cache: BaseCacheABC) -> None: """Set the global default cache instance.""" global _default_cache - _default_cache = cache + with _default_cache_lock: + _default_cache = cacheinsideLLMs/caching/model_wrapper.py (1)
55-85: PotentialUnboundLocalErrorifshould_cachelogic is modified.
cache_keyandmodel_idare only defined inside the firstif should_cacheblock (lines 57-65), but they're used again in the secondif should_cacheblock (lines 78-83). Ifshould_cacheisTrueinitially but somehow becomesFalsebefore line 78 (which can't happen in current code but could after refactoring), anUnboundLocalErrorwould occur.Consider restructuring to make the data flow clearer:
♻️ Clearer structure
def generate( self, prompt: str, temperature: float = 0.0, max_tokens: Optional[int] = None, **kwargs: Any, ) -> ModelResponse: """Generate a response, using cache if available.""" should_cache = not self._cache_only_deterministic or temperature == 0 + cache_key = None + model_id = None if should_cache: model_id = getattr(self._model, "model_id", str(type(self._model).__name__)) cache_key = generate_model_cache_key( model_id=model_id, prompt=prompt, temperature=temperature, max_tokens=max_tokens, **kwargs, ) cached = self._cache.get(cache_key) if cached is not None: return self._deserialize_response(cached) response = self._model.generate( prompt, temperature=temperature, max_tokens=max_tokens, **kwargs, ) - if should_cache: + if should_cache and cache_key is not None: self._cache.set( cache_key, self._serialize_response(response), metadata={"model_id": model_id, "prompt_preview": prompt[:100]}, ) return responseinsideLLMs/caching/__init__.py (1)
16-29: Docstring example assumesStrategyCachereturn type.The quick start example shows
result.hitandresult.value, which is specific toStrategyCache'sCacheLookupResultreturn type. If users substituteInMemoryCache(also exported), the example will fail sinceInMemoryCache.get()returnsOptional[T]directly.Consider clarifying this in the docstring or ensuring consistency across cache implementations.
| if row["expires_at"] is not None and time.time() > row["expires_at"]: | ||
| conn.execute("DELETE FROM cache WHERE key = ?", (key,)) | ||
| conn.commit() | ||
| self._stats.misses += 1 | ||
| return None |
There was a problem hiding this comment.
Missing expirations stat increment when TTL expires.
Same issue as InMemoryCache - expired entries are deleted but _stats.expirations is not incremented.
🔧 Proposed fix
if row["expires_at"] is not None and time.time() > row["expires_at"]:
conn.execute("DELETE FROM cache WHERE key = ?", (key,))
conn.commit()
self._stats.misses += 1
+ self._stats.expirations += 1
return None🤖 Prompt for AI Agents
In `@insideLLMs/caching/backends/disk.py` around lines 160 - 164, The TTL-expiry
branch in the disk cache (where it checks row["expires_at"] and deletes the row
via conn.execute("DELETE FROM cache WHERE key = ?", (key,))) fails to increment
the expiration metric; update that branch to also increment
self._stats.expirations (similarly to InMemoryCache) before returning None so
expired disk entries are counted; locate this logic in the disk backend method
that reads rows (uses conn, row, key) and add self._stats.expirations += 1 right
after conn.commit() and before returning.
|
|
||
| with open(path, "w") as f: | ||
| json.dump(entries, f, indent=2) | ||
|
|
There was a problem hiding this comment.
Export may produce non-deterministic JSON output.
json.dump(entries, f, indent=2) doesn't use sort_keys=True. If entries contain dicts with varying key orders, exports of the same data could differ, complicating diffs and reproducibility.
As per coding guidelines: "Maintain deterministic JSON/JSONL output with stable key ordering and separators for canonical artifacts."
🔧 Proposed fix
with open(path, "w") as f:
- json.dump(entries, f, indent=2)
+ json.dump(entries, f, indent=2, sort_keys=True)🤖 Prompt for AI Agents
In `@insideLLMs/caching/backends/disk.py` around lines 274 - 277, Replace the
non-deterministic json.dump call that writes entries with one that produces
canonical, stable output: when writing in the block using open(path, "w") and
json.dump(entries, f, indent=2), add sort_keys=True and a stable separators
argument (e.g., separators=(',', ': ')) so keys are consistently ordered and
spacing is canonical across exports; keep indent for readability if desired and
preserve the same file write path and context.
| # Check expiration | ||
| if entry.get("expires_at") and time.time() > entry["expires_at"]: | ||
| del self._cache[key] | ||
| self._stats.misses += 1 | ||
| return None |
There was a problem hiding this comment.
Missing expirations stat increment when TTL expires.
When an expired entry is found and deleted, only misses is incremented. The expirations field in CacheStats is never updated, making it impossible to distinguish between true misses and TTL expirations.
🔧 Proposed fix
# Check expiration
if entry.get("expires_at") and time.time() > entry["expires_at"]:
del self._cache[key]
self._stats.misses += 1
+ self._stats.expirations += 1
return None🤖 Prompt for AI Agents
In `@insideLLMs/caching/backends/memory.py` around lines 119 - 123, In the
expiration branch inside the memory cache get logic (the block that checks
entry.get("expires_at") and compares time.time() > entry["expires_at"]),
increment the CacheStats expirations counter in addition to the existing misses
increment: update self._stats.expirations (and keep self._stats.misses) before
deleting from self._cache so TTL expirations are tracked; locate this logic
around the expiration check that references entry, self._cache, and self._stats
to apply the change.
| def has(self, key: str) -> bool: | ||
| """Check if a key exists in the cache.""" | ||
| return self.get(key) is not None |
There was a problem hiding this comment.
has() calling get() causes unintended stat side effects.
The has() implementation calls get(), which in concrete implementations (InMemoryCache, DiskCache, StrategyCache) increments hit/miss counters. This means checking if a key exists will affect cache statistics, which may not be the expected behavior.
Consider making has() abstract or documenting this behavior clearly so implementations can provide a stats-free existence check.
🤖 Prompt for AI Agents
In `@insideLLMs/caching/base.py` around lines 55 - 57, The current base method
has() calls get(), causing hits/misses to increment; change has() in the base
class (insideLLMs.caching.base.Cache) to be abstract (raise NotImplementedError
or use `@abstractmethod`) so concrete caches implement a stats-free existence
check, and update InMemoryCache.has, DiskCache.has and StrategyCache.has to
perform a non-mutating existence test (e.g., check the internal dict, key
presence on disk, or delegate to underlying caches) without touching hit/miss
counters or calling get().
| def __init__(self, similarity_threshold: float = 1.0): | ||
| self.similarity_threshold = similarity_threshold | ||
| self._responses: list[tuple[str, str, Any]] = [] | ||
| self._duplicate_count = 0 |
There was a problem hiding this comment.
Missing thread safety for concurrent usage.
Unlike other cache classes in this PR that use threading.RLock, ResponseDeduplicator has no synchronization. If used from multiple threads (e.g., in an async web app), concurrent add() calls could cause race conditions on _responses and _duplicate_count.
🔒 Proposed fix to add thread safety
+import threading
+
class ResponseDeduplicator:
...
def __init__(self, similarity_threshold: float = 1.0):
self.similarity_threshold = similarity_threshold
self._responses: list[tuple[str, str, Any]] = []
self._duplicate_count = 0
+ self._lock = threading.RLock()
def add(
self,
prompt: str,
response: str,
metadata: Optional[Any] = None,
) -> tuple[bool, Optional[int]]:
"""Add response, returning whether it's a duplicate."""
+ with self._lock:
- for i, (_, existing_response, _) in enumerate(self._responses):
+ for i, (_, existing_response, _) in enumerate(self._responses):
# ... rest of method indented🤖 Prompt for AI Agents
In `@insideLLMs/caching/deduplicator.py` around lines 104 - 107,
ResponseDeduplicator is not thread-safe: add a threading.RLock to the class
(import threading and set self._lock = threading.RLock() in __init__), and wrap
any access/modification of self._responses and self._duplicate_count (e.g.,
inside the add() method and any getters/readers) with self._lock (use with
self._lock: ...). This ensures all mutations and reads of _responses and
_duplicate_count in class ResponseDeduplicator are synchronized.
| cached_result = _cache.get(key) | ||
| if cached_result is not None: | ||
| return cached_result |
There was a problem hiding this comment.
cached decorator cannot distinguish between cached None values and cache misses.
The check if cached_result is not None (line 423) treats None as a cache miss. If the wrapped function legitimately returns None, that value won't be cached effectively—subsequent calls will re-execute the function.
This is a common memoization pitfall. Consider using a sentinel value or checking if the key exists in the cache.
💡 Suggested fix using a sentinel
+_CACHE_MISS = object() # Sentinel for cache miss
def cached(
cache: Optional[BaseCacheABC] = None,
ttl: Optional[int] = None,
key_fn: Optional[Callable[..., str]] = None,
) -> Callable:
"""Decorator to cache function results using simple cache."""
_cache = cache or InMemoryCache()
def decorator(func: Callable) -> Callable:
def wrapper(*args: Any, **kwargs: Any) -> Any:
# ... key generation ...
- cached_result = _cache.get(key)
- if cached_result is not None:
- return cached_result
+ cached_result = _cache.get(key)
+ # InMemoryCache returns None on miss, check has() for existence
+ if _cache.has(key):
+ return cached_result
result = func(*args, **kwargs)
_cache.set(key, result, ttl=ttl)
return result
wrapper.__wrapped__ = func
return wrapper
return decorator🤖 Prompt for AI Agents
In `@insideLLMs/caching/factory.py` around lines 422 - 424, The cached decorator
currently treats a returned None as a cache miss by using "cached_result =
_cache.get(key)" and "if cached_result is not None", so change the lookup to
distinguish absent keys (e.g. use a unique sentinel object or check key
membership with "if key in _cache") and return the cached value when the key
exists even if it is None; update the logic around _cache.get(key) /
cached_result in the cached decorator to use that sentinel or membership check
so legitimate None results are stored and reused.
| def get_prompt_cache( | ||
| self, | ||
| name: str, | ||
| config: Optional[CacheConfig] = None, | ||
| ) -> PromptCache: | ||
| """Get or create a named prompt cache.""" | ||
| with self._lock: | ||
| if name not in self._caches or not isinstance(self._caches[name], PromptCache): | ||
| self._caches[name] = PromptCache(config or self.default_config) | ||
| return self._caches[name] |
There was a problem hiding this comment.
get_prompt_cache silently replaces existing cache, potentially losing data.
If a cache named "foo" was previously created via get_cache("foo"), calling get_prompt_cache("foo") will silently replace it with a new PromptCache, discarding all existing cached data. This asymmetric behavior (compared to get_cache which doesn't replace) could lead to surprising data loss.
Consider either:
- Raising an error when the existing cache is the wrong type
- Documenting this behavior prominently
- Making
get_cachebehave symmetrically
💡 Alternative: Raise error on type mismatch
def get_prompt_cache(
self,
name: str,
config: Optional[CacheConfig] = None,
) -> PromptCache:
"""Get or create a named prompt cache."""
with self._lock:
- if name not in self._caches or not isinstance(self._caches[name], PromptCache):
+ if name not in self._caches:
self._caches[name] = PromptCache(config or self.default_config)
+ elif not isinstance(self._caches[name], PromptCache):
+ raise TypeError(
+ f"Cache '{name}' exists but is not a PromptCache. "
+ "Use delete_cache() first if you want to replace it."
+ )
return self._caches[name]📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def get_prompt_cache( | |
| self, | |
| name: str, | |
| config: Optional[CacheConfig] = None, | |
| ) -> PromptCache: | |
| """Get or create a named prompt cache.""" | |
| with self._lock: | |
| if name not in self._caches or not isinstance(self._caches[name], PromptCache): | |
| self._caches[name] = PromptCache(config or self.default_config) | |
| return self._caches[name] | |
| def get_prompt_cache( | |
| self, | |
| name: str, | |
| config: Optional[CacheConfig] = None, | |
| ) -> PromptCache: | |
| """Get or create a named prompt cache.""" | |
| with self._lock: | |
| if name not in self._caches: | |
| self._caches[name] = PromptCache(config or self.default_config) | |
| elif not isinstance(self._caches[name], PromptCache): | |
| raise TypeError( | |
| f"Cache '{name}' exists but is not a PromptCache. " | |
| "Use delete_cache() first if you want to replace it." | |
| ) | |
| return self._caches[name] |
🤖 Prompt for AI Agents
In `@insideLLMs/caching/namespace.py` around lines 137 - 146, The get_prompt_cache
method currently overwrites any existing entry in self._caches when the stored
value is not a PromptCache, risking silent data loss; change it to detect a type
mismatch and raise a clear error instead of replacing: inside get_prompt_cache
(while holding self._lock) if name exists in self._caches and not
isinstance(self._caches[name], PromptCache) raise a TypeError (or custom
exception) explaining the name/type conflict and suggesting using get_cache or
deleting the existing cache; keep the current creation path (self._caches[name]
= PromptCache(config or self.default_config)) only when the name is absent, and
mirror analogous behavior in get_cache if needed so both getters are symmetric.
| self._prompt_keys: dict[str, str] = {} # prompt hash -> cache key | ||
|
|
||
| def cache_response( | ||
| self, | ||
| prompt: str, | ||
| response: str, | ||
| model: Optional[str] = None, | ||
| params: Optional[dict] = None, | ||
| metadata: Optional[dict] = None, | ||
| ) -> CacheEntry: | ||
| """Cache an LLM response.""" | ||
| key = generate_cache_key(prompt, model, params, self.config.hash_algorithm) | ||
|
|
||
| entry_metadata = { | ||
| "prompt": prompt, | ||
| "model": model, | ||
| "params": params, | ||
| **(metadata or {}), | ||
| } | ||
|
|
||
| entry = self.set(key, response, metadata=entry_metadata) | ||
|
|
||
| # Track prompt -> key mapping | ||
| prompt_hash = hashlib.md5(prompt.encode()).hexdigest() | ||
| self._prompt_keys[prompt_hash] = key |
There was a problem hiding this comment.
_prompt_keys mapping is not synchronized with cache evictions.
When entries are evicted from the cache (due to max_size limits or TTL expiration), the corresponding entries in _prompt_keys are not removed. Over time, this mapping will grow unbounded and contain stale references to keys no longer in the cache.
Consider cleaning up _prompt_keys when entries are evicted, or lazily removing stale entries during lookups.
💡 Suggestion: Clean up stale mappings in get_by_prompt
def get_by_prompt(self, prompt: str) -> CacheLookupResult:
"""Get cached response by prompt alone (ignoring model/params)."""
prompt_hash = hashlib.md5(prompt.encode()).hexdigest()
key = self._prompt_keys.get(prompt_hash)
if key:
- return self.get(key)
+ result = self.get(key)
+ if not result.hit:
+ # Clean up stale mapping
+ del self._prompt_keys[prompt_hash]
+ return result
return CacheLookupResult(hit=False, value=None, key="")🤖 Prompt for AI Agents
In `@insideLLMs/caching/prompt_cache.py` around lines 124 - 148, The _prompt_keys
mapping in cache_response stores prompt_hash -> cache key but is never cleaned
on eviction; update the cache to remove stale mappings either by (A) hooking
into the cache eviction/expire path (or the set/evict function) to delete any
_prompt_keys entries whose value equals the evicted key, or (B) making
get_by_prompt defensive: compute prompt_hash, look up key in _prompt_keys, then
verify the key exists in the cache via the cache lookup (e.g., call
self.get(key) or self._has(key)); if the key is missing, remove the prompt_hash
from self._prompt_keys and return a cache miss. Modify methods named
cache_response, get_by_prompt, and the cache eviction/eject path (or set/evict
helpers) to implement one of these cleanup strategies so _prompt_keys cannot
grow with stale entries.
| Iterating over all statuses: | ||
|
|
||
| >>> for status in CacheStatus: | ||
| ... print(status.value) | ||
| active | ||
| expired | ||
| evicted | ||
| warming | ||
| """ | ||
|
|
||
| # Backward-compatible lookup statuses (used by older tests/consumers). | ||
| HIT = "hit" | ||
| MISS = "miss" | ||
|
|
||
| ACTIVE = "active" | ||
| EXPIRED = "expired" | ||
| EVICTED = "evicted" | ||
| WARMING = "warming" |
There was a problem hiding this comment.
Docstring example inconsistent with backward-compatible enum values.
The docstring's iteration example (lines 143-148) shows only active, expired, evicted, warming but the enum also includes HIT and MISS for backward compatibility. This could confuse users when they iterate and see 6 values instead of 4.
📝 Suggested docstring fix
Iterating over all statuses:
>>> for status in CacheStatus:
... print(status.value)
+ hit
+ miss
active
expired
evicted
warming📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| Iterating over all statuses: | |
| >>> for status in CacheStatus: | |
| ... print(status.value) | |
| active | |
| expired | |
| evicted | |
| warming | |
| """ | |
| # Backward-compatible lookup statuses (used by older tests/consumers). | |
| HIT = "hit" | |
| MISS = "miss" | |
| ACTIVE = "active" | |
| EXPIRED = "expired" | |
| EVICTED = "evicted" | |
| WARMING = "warming" | |
| Iterating over all statuses: | |
| >>> for status in CacheStatus: | |
| ... print(status.value) | |
| hit | |
| miss | |
| active | |
| expired | |
| evicted | |
| warming | |
| """ | |
| # Backward-compatible lookup statuses (used by older tests/consumers). | |
| HIT = "hit" | |
| MISS = "miss" | |
| ACTIVE = "active" | |
| EXPIRED = "expired" | |
| EVICTED = "evicted" | |
| WARMING = "warming" |
🤖 Prompt for AI Agents
In `@insideLLMs/caching/types.py` around lines 141 - 158, The docstring iteration
example for CacheStatus is inconsistent with the enum because CacheStatus also
defines backward-compatible values HIT and MISS; update the docstring for
CacheStatus to either include 'hit' and 'miss' in the printed iteration example
or add a note that HIT and MISS are present for backward compatibility and will
appear when iterating. Locate the CacheStatus enum and its members (HIT, MISS,
ACTIVE, EXPIRED, EVICTED, WARMING) and adjust the example text accordingly so
the docstring matches the actual enum contents.
| key = self.key_generator(*args, **kwargs) | ||
| result = self.cache.get(key) | ||
|
|
||
| if result.hit: | ||
| self._cache_calls += 1 | ||
| return result.value |
There was a problem hiding this comment.
Cache interface assumption may cause issues with different backends.
MemoizedFunction assumes cache.get(key) returns a CacheLookupResult with .hit and .value attributes. However, InMemoryCache.get() returns Optional[T] directly (per the relevant snippet from memory.py), not a CacheLookupResult.
This class will only work correctly with StrategyCache (which returns CacheLookupResult), but the type hint suggests any BaseCache (aliased to StrategyCache) should work. If someone passes an InMemoryCache or a cache implementing BaseCacheABC, this will fail.
🐛 Proposed fix to handle both interfaces
key = self.key_generator(*args, **kwargs)
result = self.cache.get(key)
- if result.hit:
+ # Handle both CacheLookupResult (StrategyCache) and Optional[T] (InMemoryCache)
+ if hasattr(result, 'hit'):
+ if result.hit:
+ self._cache_calls += 1
+ return result.value
+ elif result is not None:
self._cache_calls += 1
- return result.value
+ return result
value = self.func(*args, **kwargs)
self.cache.set(key, value)
return value📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| key = self.key_generator(*args, **kwargs) | |
| result = self.cache.get(key) | |
| if result.hit: | |
| self._cache_calls += 1 | |
| return result.value | |
| key = self.key_generator(*args, **kwargs) | |
| result = self.cache.get(key) | |
| # Handle both CacheLookupResult (StrategyCache) and Optional[T] (InMemoryCache) | |
| if hasattr(result, 'hit'): | |
| if result.hit: | |
| self._cache_calls += 1 | |
| return result.value | |
| elif result is not None: | |
| self._cache_calls += 1 | |
| return result | |
| value = self.func(*args, **kwargs) | |
| self.cache.set(key, value) | |
| return value |
🤖 Prompt for AI Agents
In `@insideLLMs/caching/utilities.py` around lines 337 - 342, MemoizedFunction
assumes cache.get(key) returns an object with .hit and .value; update
MemoizedFunction to handle both return types by checking the result from
self.cache.get(key): if it has attributes .hit/.value (as returned by
StrategyCache/CacheLookupResult) use them, otherwise treat a truthy/non-None
return as a hit and the return value as the cached value; adjust the increment
of self._cache_calls and the return accordingly. Reference symbols:
MemoizedFunction, key_generator, self.cache.get, _cache_calls,
CacheLookupResult, StrategyCache, InMemoryCache.get, BaseCacheABC.
Code ReviewI found 2 issues that need attention: 1. CLAUDE.md Documentation Update RequiredFile: CLAUDE.md, line 28 CLAUDE.md line 28 documents After this refactoring, Reference: CLAUDE.md line 27-29 Suggested fix: 2. StrategyCache Backward Compatibility BreakFile: insideLLMs/caching/backends/strategy.py, line 135 The new Original signature (caching_unified.py lines 5390-5396): def __init__(
self,
config: Optional[CacheConfig] = None,
*,
strategy: Optional[CacheStrategy] = None,
max_size: Optional[int] = None,
):New signature (strategy.py line 135): def __init__(self, config: Optional[CacheConfig] = None):Impact:
Reference: strategy.py line 133-137 Suggested fix: |
Description
Splits the 3,488-line
caching_unified.pymonolith into a well-organizedcaching/package with 12 focused modules:types.pybase.pybackends/memory.pybackends/disk.pybackends/strategy.pyprompt_cache.pymodel_wrapper.pyutilities.pynamespace.pydeduplicator.pyasync_adapter.pyfactory.pyBackward compatibility:
caching_unified.pyis now a re-export shim - all existing imports continue to work unchanged.Related Issue
Part of large file splitting initiative identified in codebase deep dive.
Type of Change
Checklist
ruff checkandruff format)pytest)Testing
make lintpasses (ruff check)pytest-asyncioplugin)Human Review Checklist
ResponseDeduplicatorAPI matches original (add(),get_unique_responses(),get_duplicate_count(),clear()methods withsimilarity_thresholdparameter)AsyncCacheAdapter.set()returnsCacheEntry(notNone)caching_unified.pyAdditional Notes
pytest-asyncio- not related to this refactoringinsideLLMs.cachingfor cleaner importsLink to Devin run: https://app.devin.ai/sessions/4ea4b20132954ccbb473719daedd219e
Requested by: Dr Gareth Roberts (@dr-gareth-roberts)
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.