Skip to content

treehauslabs/Acorn

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Acorn

Content-addressable storage for Swift, built on actors.

let store = await CompositeCASWorker(
    workers: ["memory": MemoryCASWorker(), "disk": try DiskCASWorker(directory: cacheURL)],
    order: ["memory", "disk"]
)

let cid = await store.store(data: Data("hello, acorn".utf8))  // SHA-256 content identifier
let data = await store.get(cid: cid)                          // Data("hello, acorn")

Store data once, retrieve by hash. Workers chain from fast to slow. When a far worker has data that nearer workers lack, it pushes toward the fast end via store so the next read is faster. No manual cache warming, no TTLs, no invalidation -- content addresses don't go stale.

Why Content-Addressable Storage

"There are only two hard things in Computer Science: cache invalidation and naming things." -- Phil Karlton

Content-addressable storage eliminates both. You don't name things -- the content names itself via its SHA-256 hash. And there is nothing to invalidate -- sha256:2cf24dba... is "hello" today, tomorrow, and forever.

This is not a theoretical pattern. It is the foundation of the most reliable infrastructure in production:

  • Git stores every blob, tree, and commit by hash. Renaming a file stores no new data. Branching is free because it's just pointer manipulation over a CAS object graph.
  • Docker/OCI registries deduplicate image layers by digest. Hundreds of images sharing the same alpine base transfer and store that layer exactly once.
  • Bazel and Buck use a content-addressed remote cache so any worker in a distributed build can trust artifacts from any other worker -- if the hash matches, the content is correct.
  • Nix is moving to content-addressed derivations, enabling "early cutoff" (a security patch to openssl skips rebuilding downstream packages whose outputs are byte-identical) and multi-user stores without mutual trust.
  • IPFS makes every block addressable by CID, letting any peer serve any content with client-side integrity verification. No central authority required.
  • Restic and Borg achieve 60-85% deduplication on typical backup workloads because unchanged files produce identical content chunks.

The alternative -- name-addressed caches keyed by URL, path, or ID -- requires answering "when does this entry become stale?" TTL too short, and the cache is useless. TTL too long, and users see stale data. Event-driven invalidation requires distributed pub/sub with exactly-once delivery, and a single missed event means permanent staleness. Meta spent years engineering TAO's cache from six nines to ten nines of consistency; at their scale, six nines still meant millions of inconsistent reads per day. Facebook's 2010 outage -- 2.5 hours of complete downtime -- was caused by a cache invalidation cascade.

CAS makes the question disappear. The address is the content. It cannot become stale, collide with another entry, or fail an integrity check silently.

Why Swift, why now. There is no lightweight, composable CAS library in the Swift ecosystem. The options are massive systems (IPFS daemon), tool-specific internals (Git's object store), or rolling your own [String: Data] dictionary without eviction, chaining, or timeout support. Swift's actor model is a natural fit -- a CAS worker is inherently a stateful concurrent service, and actors give you isolated mutable state with zero data races by construction. Emerging Swift-relevant use cases -- on-device ML model caching, offline-first mobile sync, edge container distribution -- all benefit from content-addressing.

  near ←――――――――――――――――――――――――――――――――――――→ far

  ┌──────────────┐    ┌──────────┐    ┌──────────┐
  │    Memory    │◄──►│   Disk   │◄──►│ Network  │
  └──────────────┘    └──────────┘    └──────────┘
       fastest             ↕             slowest
      volatile          durable         complete

Installation

dependencies: [
    .package(url: "https://github.com/treehauslabs/Acorn.git", from: "1.0.0"),
]

Requirements: Swift 6.0+, macOS 13+ / iOS 16+. Depends on swift-crypto for SHA-256.

Usage

Content Identifiers

Data is addressed by its SHA-256 hash. The content is the key.

let cid = ContentIdentifier(for: Data("hello".utf8))
// 2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824

ContentIdentifier(for: Data("hello".utf8)) == cid  // true, deterministic
ContentIdentifier(for: Data("world".utf8)) == cid  // false

ContentIdentifier is Hashable, Sendable, and CustomStringConvertible. Construct from a known hash via ContentIdentifier(rawValue:).

Chaining Workers

CompositeCASWorker takes named workers ordered from near (fastest) to far (slowest) and links their near/far references automatically:

let memory = MemoryCASWorker(capacity: 1_000)
let disk = try DiskCASWorker(directory: cacheURL, capacity: 50_000)

let store = await CompositeCASWorker(
    workers: ["memory": memory, "disk": disk],
    order: ["memory", "disk"]
)

let cid = await store.store(data: imageData)  // writes to both layers
let data = await store.get(cid: cid)          // checks memory first, falls back to disk

get recurses toward near. The nearest worker with the data wins. If a farther worker finds data locally, it calls store toward near to cache it closer. Next read hits memory.

store writes locally, then propagates toward near.

Access individual workers by name: await store["memory"]?.has(cid: cid)

Composites conform to AcornCASWorker, so they nest inside other composites.

Writing a Custom Worker

Implement three methods. The protocol provides get, store, and chaining for free.

public actor RedisCASWorker: AcornCASWorker {
    public var near: (any AcornCASWorker)?
    public var far: (any AcornCASWorker)?
    public let timeout: Duration?

    public func has(cid: ContentIdentifier) -> Bool { /* redis.exists */ }
    public func getLocal(cid: ContentIdentifier) -> Data? { /* redis.get */ }
    public func storeLocal(cid: ContentIdentifier, data: Data) { /* redis.set */ }
}

Concrete workers live in separate packages to keep dependencies minimal:

Package Storage
AcornMemoryWorker In-process dictionary
AcornDiskWorker File per CID

Timeouts

Three levels; the shortest wins.

// Per-worker: caps getLocal
let network = NetworkWorker(timeout: .seconds(5))

// Per-composite: caps the entire chain
let store = await CompositeCASWorker(workers: ..., order: ..., timeout: .seconds(10))

// Per-call: caps a single invocation
let data = await store.get(cid: cid, timeout: .milliseconds(500))

A worker's local timeout doesn't abort the chain -- get checks near first, so a slow disk doesn't block a fast memory hit.

LFU Decay Cache

LFUDecayCache tracks access frequency with exponential time decay for workers that need bounded storage.

var cache = LFUDecayCache(capacity: 10_000, halfLife: .seconds(300), sampleSize: 5)

cache.recordAccess(cid)                     // O(1) -- bump score
cache.evictionCandidate()                   // O(k) -- sample random entries, pick lowest
cache.needsEviction(for: newCID)            // O(1) -- at capacity for unknown CID?

O(1) decay -- a global multiplier shrinks over time instead of touching every score. New accesses add 1 / globalMultiplier, preserving relative ordering.

Sampled eviction -- picks the lowest-scored among sampleSize random entries, like Redis allkeys-lfu. Full scan for small caches.

Background renormalization -- when the multiplier risks underflow (>1e100 inverse), workers renormalize scores in batches of 64, yielding between batches.

API

AcornCASWorker (Protocol)

public protocol AcornCASWorker: Actor {
    var timeout: Duration? { get }
    var near: (any AcornCASWorker)? { get set }
    var far: (any AcornCASWorker)? { get set }

    func has(cid: ContentIdentifier) async -> Bool
    func getLocal(cid: ContentIdentifier) async -> Data?
    func get(cid: ContentIdentifier) async -> Data?
    func storeLocal(cid: ContentIdentifier, data: Data) async
    func store(cid: ContentIdentifier, data: Data) async
}

You implement: has, getLocal, storeLocal

Protocol provides: get (near-first traversal, local fallback, pushes to near on local hit), store (local + propagate to near), link(near:far:), default timeout of nil

CompositeCASWorker

public actor CompositeCASWorker: AcornCASWorker {
    public init(workers: [String: any AcornCASWorker], order: [String], timeout: Duration? = nil) async
    public subscript(name: String) -> (any AcornCASWorker)?
    public func store(data: Data) async -> ContentIdentifier
    public func get(cid: ContentIdentifier, timeout: Duration?) async -> Data?
}

LFUDecayCache

Method Complexity Description
recordAccess(_:) O(1) Bump frequency score, apply global decay
evictionCandidate() O(k) Lowest-scored among sampleSize random samples
needsEviction(for:) O(1) At capacity for unknown CID?
remove(_:) O(1) Remove from tracking
effectiveScore(for:) O(1) Current score with decay applied
claimRenormalization() O(1) Claim pending renorm work
applyRenormFactor(_:factor:) O(1) Apply factor to single entry

Testing

swift test  # 30 tests across ContentIdentifier, CompositeCASWorker, LFUDecayCache

License

[Include your license here]

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages