Content-addressable storage for Swift, built on actors.
let store = await CompositeCASWorker(
workers: ["memory": MemoryCASWorker(), "disk": try DiskCASWorker(directory: cacheURL)],
order: ["memory", "disk"]
)
let cid = await store.store(data: Data("hello, acorn".utf8)) // SHA-256 content identifier
let data = await store.get(cid: cid) // Data("hello, acorn")Store data once, retrieve by hash. Workers chain from fast to slow. When a far worker has data that nearer workers lack, it pushes toward the fast end via store so the next read is faster. No manual cache warming, no TTLs, no invalidation -- content addresses don't go stale.
"There are only two hard things in Computer Science: cache invalidation and naming things." -- Phil Karlton
Content-addressable storage eliminates both. You don't name things -- the content names itself via its SHA-256 hash. And there is nothing to invalidate -- sha256:2cf24dba... is "hello" today, tomorrow, and forever.
This is not a theoretical pattern. It is the foundation of the most reliable infrastructure in production:
- Git stores every blob, tree, and commit by hash. Renaming a file stores no new data. Branching is free because it's just pointer manipulation over a CAS object graph.
- Docker/OCI registries deduplicate image layers by digest. Hundreds of images sharing the same
alpinebase transfer and store that layer exactly once. - Bazel and Buck use a content-addressed remote cache so any worker in a distributed build can trust artifacts from any other worker -- if the hash matches, the content is correct.
- Nix is moving to content-addressed derivations, enabling "early cutoff" (a security patch to openssl skips rebuilding downstream packages whose outputs are byte-identical) and multi-user stores without mutual trust.
- IPFS makes every block addressable by CID, letting any peer serve any content with client-side integrity verification. No central authority required.
- Restic and Borg achieve 60-85% deduplication on typical backup workloads because unchanged files produce identical content chunks.
The alternative -- name-addressed caches keyed by URL, path, or ID -- requires answering "when does this entry become stale?" TTL too short, and the cache is useless. TTL too long, and users see stale data. Event-driven invalidation requires distributed pub/sub with exactly-once delivery, and a single missed event means permanent staleness. Meta spent years engineering TAO's cache from six nines to ten nines of consistency; at their scale, six nines still meant millions of inconsistent reads per day. Facebook's 2010 outage -- 2.5 hours of complete downtime -- was caused by a cache invalidation cascade.
CAS makes the question disappear. The address is the content. It cannot become stale, collide with another entry, or fail an integrity check silently.
Why Swift, why now. There is no lightweight, composable CAS library in the Swift ecosystem. The options are massive systems (IPFS daemon), tool-specific internals (Git's object store), or rolling your own [String: Data] dictionary without eviction, chaining, or timeout support. Swift's actor model is a natural fit -- a CAS worker is inherently a stateful concurrent service, and actors give you isolated mutable state with zero data races by construction. Emerging Swift-relevant use cases -- on-device ML model caching, offline-first mobile sync, edge container distribution -- all benefit from content-addressing.
near ←――――――――――――――――――――――――――――――――――――→ far
┌──────────────┐ ┌──────────┐ ┌──────────┐
│ Memory │◄──►│ Disk │◄──►│ Network │
└──────────────┘ └──────────┘ └──────────┘
fastest ↕ slowest
volatile durable complete
dependencies: [
.package(url: "https://github.com/treehauslabs/Acorn.git", from: "1.0.0"),
]Requirements: Swift 6.0+, macOS 13+ / iOS 16+. Depends on swift-crypto for SHA-256.
Data is addressed by its SHA-256 hash. The content is the key.
let cid = ContentIdentifier(for: Data("hello".utf8))
// 2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824
ContentIdentifier(for: Data("hello".utf8)) == cid // true, deterministic
ContentIdentifier(for: Data("world".utf8)) == cid // falseContentIdentifier is Hashable, Sendable, and CustomStringConvertible. Construct from a known hash via ContentIdentifier(rawValue:).
CompositeCASWorker takes named workers ordered from near (fastest) to far (slowest) and links their near/far references automatically:
let memory = MemoryCASWorker(capacity: 1_000)
let disk = try DiskCASWorker(directory: cacheURL, capacity: 50_000)
let store = await CompositeCASWorker(
workers: ["memory": memory, "disk": disk],
order: ["memory", "disk"]
)
let cid = await store.store(data: imageData) // writes to both layers
let data = await store.get(cid: cid) // checks memory first, falls back to diskget recurses toward near. The nearest worker with the data wins. If a farther worker finds data locally, it calls store toward near to cache it closer. Next read hits memory.
store writes locally, then propagates toward near.
Access individual workers by name: await store["memory"]?.has(cid: cid)
Composites conform to AcornCASWorker, so they nest inside other composites.
Implement three methods. The protocol provides get, store, and chaining for free.
public actor RedisCASWorker: AcornCASWorker {
public var near: (any AcornCASWorker)?
public var far: (any AcornCASWorker)?
public let timeout: Duration?
public func has(cid: ContentIdentifier) -> Bool { /* redis.exists */ }
public func getLocal(cid: ContentIdentifier) -> Data? { /* redis.get */ }
public func storeLocal(cid: ContentIdentifier, data: Data) { /* redis.set */ }
}Concrete workers live in separate packages to keep dependencies minimal:
| Package | Storage |
|---|---|
| AcornMemoryWorker | In-process dictionary |
| AcornDiskWorker | File per CID |
Three levels; the shortest wins.
// Per-worker: caps getLocal
let network = NetworkWorker(timeout: .seconds(5))
// Per-composite: caps the entire chain
let store = await CompositeCASWorker(workers: ..., order: ..., timeout: .seconds(10))
// Per-call: caps a single invocation
let data = await store.get(cid: cid, timeout: .milliseconds(500))A worker's local timeout doesn't abort the chain -- get checks near first, so a slow disk doesn't block a fast memory hit.
LFUDecayCache tracks access frequency with exponential time decay for workers that need bounded storage.
var cache = LFUDecayCache(capacity: 10_000, halfLife: .seconds(300), sampleSize: 5)
cache.recordAccess(cid) // O(1) -- bump score
cache.evictionCandidate() // O(k) -- sample random entries, pick lowest
cache.needsEviction(for: newCID) // O(1) -- at capacity for unknown CID?O(1) decay -- a global multiplier shrinks over time instead of touching every score. New accesses add 1 / globalMultiplier, preserving relative ordering.
Sampled eviction -- picks the lowest-scored among sampleSize random entries, like Redis allkeys-lfu. Full scan for small caches.
Background renormalization -- when the multiplier risks underflow (>1e100 inverse), workers renormalize scores in batches of 64, yielding between batches.
public protocol AcornCASWorker: Actor {
var timeout: Duration? { get }
var near: (any AcornCASWorker)? { get set }
var far: (any AcornCASWorker)? { get set }
func has(cid: ContentIdentifier) async -> Bool
func getLocal(cid: ContentIdentifier) async -> Data?
func get(cid: ContentIdentifier) async -> Data?
func storeLocal(cid: ContentIdentifier, data: Data) async
func store(cid: ContentIdentifier, data: Data) async
}You implement: has, getLocal, storeLocal
Protocol provides: get (near-first traversal, local fallback, pushes to near on local hit), store (local + propagate to near), link(near:far:), default timeout of nil
public actor CompositeCASWorker: AcornCASWorker {
public init(workers: [String: any AcornCASWorker], order: [String], timeout: Duration? = nil) async
public subscript(name: String) -> (any AcornCASWorker)?
public func store(data: Data) async -> ContentIdentifier
public func get(cid: ContentIdentifier, timeout: Duration?) async -> Data?
}| Method | Complexity | Description |
|---|---|---|
recordAccess(_:) |
O(1) | Bump frequency score, apply global decay |
evictionCandidate() |
O(k) | Lowest-scored among sampleSize random samples |
needsEviction(for:) |
O(1) | At capacity for unknown CID? |
remove(_:) |
O(1) | Remove from tracking |
effectiveScore(for:) |
O(1) | Current score with decay applied |
claimRenormalization() |
O(1) | Claim pending renorm work |
applyRenormFactor(_:factor:) |
O(1) | Apply factor to single entry |
swift test # 30 tests across ContentIdentifier, CompositeCASWorker, LFUDecayCache[Include your license here]