A Swift library for building versioned, tamper-evident key-value stores where data can live anywhere and load on demand.
import cashew
var dict = MerkleDictionaryImpl<String>()
dict = try dict.inserting(key: "alice", value: "engineer")
dict = try dict.inserting(key: "bob", value: "designer")
let header = HeaderImpl(node: dict)
print(header.rawCID) // "baguqeera..." — unique fingerprint of this exact dataEvery version of the data gets a unique CID (Content Identifier) — a SHA2-256 hash of its deterministic JSON serialization. Same content always produces the same CID. Change anything and the CID changes.
Most key-value stores treat data as a mutable blob. Cashew treats data as an immutable, content-addressed Merkle radix trie — every "write" produces a new root with a new CID, while unchanged branches share structure with previous versions. This gives you three things traditional stores don't: verifiable integrity (the CID proves the data hasn't been tampered with), lazy loading (load only the branches you need from any content-addressable backend), and efficient proofs (prove a key exists or doesn't without revealing the entire dataset).
- Content-addressable storage backends (IPFS, CAS databases)
- Versioned state where every mutation must be auditable
- Distributed systems that need tamper-evident data exchange with data locality hints
- Selective encryption — some fields public, some private, same data structure
- Sparse proofs — prove properties about specific keys without the full dataset
- High-throughput mutable key-value stores (Cashew is immutable; every write allocates)
- Data that doesn't need content addressing or integrity guarantees
- Simple in-memory caches where a
Dictionarysuffices
| Concept | What it means |
|---|---|
| CID | Content Identifier — a hash of the data. Same data → same CID. |
| Header | A smart pointer holding a CID and optionally the data it refers to. node == nil means "unresolved" — you know the hash but haven't loaded the data yet. |
| Resolve | Fetch data from a store using its CID. Three strategies: .targeted (one node), .recursive (full subtree), .list (structure only). |
| Transform | Mutate the structure (insert/update/delete). Returns a new tree with new CIDs for changed nodes. |
| Proof | Generate a minimal subtree proving a key exists, doesn't exist, or can be modified. |
| MerkleDictionary | The top-level key-value map. Dispatches by first character to a compressed radix trie. |
| MerkleArray | Append-only ordered collection backed by a MerkleDictionary with UInt256 binary keys. Supports efficient range queries. |
| MerkleSet | Membership-only set backed by a MerkleDictionary with empty string sentinels. Supports union, intersection, difference. |
| Volume | A Header subtype marking a point in the DAG where peers may store child blocks contiguously. Notifies the fetcher before resolving so it can locate the right peer. |
Add to your Package.swift:
dependencies: [
.package(url: "https://github.com/treehauslabs/cashew.git", from: "1.0.0")
]Requires macOS 12.0+ and Swift 6.0+.
import cashew
var dict = MerkleDictionaryImpl<String>()
dict = try dict.inserting(key: "alice", value: "engineer")
dict = try dict.inserting(key: "bob", value: "designer")
dict = try dict.inserting(key: "alicia", value: "manager")
let value = try dict.get(key: "alice") // Optional("engineer")
dict = try dict.mutating(key: "bob", value: "lead designer")
dict = try dict.deleting(key: "alicia")
let keys: Set<String> = try dict.allKeys()
let pairs: [String: String] = try dict.allKeysAndValues()Every structure has a deterministic CID derived from its content:
let node = RadixNodeImpl<String>(prefix: "hello", value: "world", children: [:])
let header = RadixHeaderImpl(node: node)
print(header.rawCID) // CIDv1 string, e.g. "baguqeera..."
let header2 = RadixHeaderImpl(node: node)
assert(header.rawCID == header2.rawCID) // same content → same CIDHeaders can exist as CID-only references. Resolution fetches the actual data using a pluggable Fetcher:
struct IPFSFetcher: Fetcher {
func fetch(rawCid: String) async throws -> Data {
// fetch from IPFS, a database, or any content-addressable store
}
}
let fetcher = IPFSFetcher()
var paths = ArrayTrie<ResolutionStrategy>()
paths.set(["users", "a"], value: .targeted) // fetch just this node
paths.set(["config"], value: .recursive) // fetch entire subtree
paths.set(["posts"], value: .list) // fetch structure only, values stay CID-only
let resolved = try await dictionary.resolve(paths: paths, fetcher: fetcher)Persist resolved data using a pluggable Storer:
struct MyStore: Storer {
func store(rawCid: String, data: Data) throws {
// write to disk, database, IPFS, etc.
}
}
try header.storeRecursively(storer: MyStore())Apply multiple insert/update/delete operations in one pass:
var transforms = ArrayTrie<Transform>()
transforms.set(["alice"], value: .update("senior engineer"))
transforms.set(["charlie"], value: .insert("intern"))
transforms.set(["bob"], value: .delete)
let newDict = try dict.transform(transforms: transforms)
// newDict has the mutations applied; CIDs are recomputedGenerate minimal subtrees that prove specific properties about keys:
var proofPaths = ArrayTrie<SparseMerkleProof>()
proofPaths.set(["alice"], value: .existence) // prove key exists
proofPaths.set(["dave"], value: .insertion) // prove key doesn't exist (safe to insert)
proofPaths.set(["bob"], value: .mutation) // prove key exists (can be updated)
proofPaths.set(["charlie"], value: .deletion) // prove key exists with neighbors (can be deleted)
let proof = try await dictionary.proof(paths: proofPaths, fetcher: fetcher)
// proof contains only the nodes needed to verify these propertiesValues can themselves be Header types wrapping MerkleDictionary, enabling hierarchical structures:
typealias InnerDict = MerkleDictionaryImpl<String>
typealias OuterDict = MerkleDictionaryImpl<HeaderImpl<InnerDict>>
var outer = OuterDict()
let inner = try InnerDict()
.inserting(key: "name", value: "Alice")
.inserting(key: "role", value: "Engineer")
let innerHeader = HeaderImpl(node: inner)
outer = try outer.inserting(key: "user1", value: innerHeader)Transforms on nested dictionaries propagate through both levels:
var transforms = ArrayTrie<Transform>()
transforms.set(["user1", "name"], value: .update("Alicia"))
let updated = try outer.transform(transforms: transforms)
// The nested dictionary gets a new CID, which changes the parent's CIDA Merkle DAG is location-agnostic by design. Every block is addressed by its content hash, and a Fetcher retrieves blocks one CID at a time. This is a strength — the same data can live on IPFS, a database, a filesystem, or a peer's memory — but it creates a blind spot: the fetcher has no idea which blocks are related.
Consider a tree with 10,000 nodes. A naive resolver issues 10,000 independent fetch(cid) calls. Each one is a separate network request with its own peer lookup, connection setup, and round-trip. The fetcher has no way to know that all 10,000 blocks live under the same root and are probably stored together on the same peer. It can't batch, it can't pipeline, it can't route intelligently.
This is the gap between content addressing (which identifies data) and data locality (which describes where data physically lives). Content addressing deliberately erases location. But when you need to actually retrieve data from a distributed network, location matters.
A Volume is a Header subtype that marks a point in the Merkle DAG where a peer is likely to store the child blocks contiguously. It is a data locality boundary — a declaration that the subtree beneath this point forms a logical unit that peers advertise, replicate, and serve as a whole.
When resolution enters a Volume, it calls provide(rootCID:paths:) on the fetcher before resolving any child blocks. This gives the fetcher two pieces of information:
- The root CID — the identity of the volume. A peer that advertises this CID likely has all the child blocks beneath it.
- The resolution paths — what the resolver is about to traverse. The fetcher can use this to prefetch, open a streaming session, or scope a query.
The fetcher is free to act on this information however it wants — connect to a specific peer, warm a cache, open a multiplexed stream — or ignore it entirely. If the fetcher doesn't conform to VolumeAwareFetcher, nothing changes; the Volume resolves like any other Header.
The CID is a property of the Header, not the Node. A Node doesn't know its own hash — it's computed from the Node's serialization and stored in the Header that wraps it. Since provide needs the CID, the interception point must be the Header.
Making Volume a child protocol of Header (rather than Node) also means any Node type can be wrapped in a Volume without modification. A MerkleDictionary, a MerkleArray, a custom Node — they all work as the inner type. The data locality concern is separated from the data structure concern.
P2P retrieval (IPFS, Filecoin, BitTorrent). A dataset is pinned by a storage provider. Without a Volume boundary, the resolver does independent DHT lookups for every block. With a Volume, the fetcher discovers the provider once and streams blocks from them directly. The difference is one DHT lookup vs. thousands.
Sharded databases. A multi-tenant system stores each tenant's data under a separate root CID. When a request comes in for tenant X, the provide call tells the fetcher which shard to route to. Every subsequent fetch in that resolution goes to the right replica.
Edge caching and CDNs. A CDN replicates entire volumes to edge nodes — a product catalog, a config store, a user directory. The provide call lets the fetcher check the local edge cache before falling back to origin. Without it, each block fetch independently checks the cache, missing the optimization of knowing they all belong to the same cached dataset.
Collaborative editing. Each document is a volume. When a client opens a document, provide tells the fetcher to connect to the peer currently editing it (who has the latest blocks in memory), rather than pulling stale blocks from persistent storage.
Offline-first sync. A mobile client syncs volumes, not individual blocks. The volume boundary tells the sync engine what to replicate as a unit. When the client comes back online, provide identifies which peer has the latest version of each volume.
Volumes can nest. Each boundary triggers its own provide call with its own CID, allowing the fetcher to locate different peers for different parts of the tree.
This mirrors real data topology. A database (outer volume) contains tables (inner volumes). The database might live on a cluster, but each table is sharded to a specific node within that cluster. When resolution crosses the database boundary, the fetcher connects to the cluster. When it crosses a table boundary, the fetcher routes to the specific shard.
VolumeImpl<Database> ← provide(databaseCID, ...) → connect to cluster
└─ Database
├─ "users" → VolumeImpl<UserTable> ← provide(usersCID, ...) → route to shard A
│ └─ UserTable (MerkleDictionary)
│ ├─ "alice" → ...
│ └─ "bob" → ...
└─ "logs" → VolumeImpl<LogTable> ← provide(logsCID, ...) → route to shard B
└─ LogTable (MerkleArray)
├─ event_0
└─ event_1
Each provide call gives the fetcher progressively more specific routing information. The fetcher can use the outer provide to establish a connection to the cluster and the inner provide to select a shard — or it can ignore nesting and treat each provide independently.
Mark a dataset boundary with VolumeImpl:
typealias UserVolume = VolumeImpl<MerkleDictionaryImpl<String>>
var users = MerkleDictionaryImpl<String>()
users = try users.inserting(key: "alice", value: "engineer")
users = try users.inserting(key: "bob", value: "designer")
let vol = UserVolume(node: users)
try vol.storeRecursively(storer: myStore)Resolve from a CID-only reference — the fetcher receives the volume boundary:
let resolved = try await UserVolume(rawCID: vol.rawCID)
.resolve(paths: paths, fetcher: myFetcher)Implement VolumeAwareFetcher to act on the hint:
struct P2PFetcher: VolumeAwareFetcher {
func provide(rootCID: String, paths: ArrayTrie<ResolutionStrategy>) async throws {
// Look up which peer advertises this CID.
// Open a connection or streaming session to that peer.
// Subsequent fetch() calls can use this connection.
}
func fetch(rawCid: String) async throws -> Data {
// Fetch from the connected peer, or fall back to DHT.
}
}If the fetcher doesn't conform to VolumeAwareFetcher, the Volume resolves normally — provide is simply not called. This makes Volumes a zero-cost abstraction for fetchers that don't need locality hints.
Nested volumes:
typealias InnerVolume = VolumeImpl<MerkleDictionaryImpl<String>>
typealias OuterDict = MerkleDictionaryImpl<InnerVolume>
typealias OuterVolume = VolumeImpl<OuterDict>
// Resolving the outer volume calls provide(outerCID, ...)
// When resolution descends into a child InnerVolume, it calls provide(innerCID, ...)
// The fetcher gets two provide calls — one per volume boundary.An ordered, append-only collection backed by a MerkleDictionary with 256-bit binary string keys. Indices are encoded as 256-character strings of 0 and 1, so lexicographic order matches numeric order and the radix trie compresses sequential indices efficiently.
import cashew
var log = MerkleArrayImpl<String>()
log = try log.append("event_1")
log = try log.append("event_2")
log = try log.append("event_3")
let first = try log.first() // "event_1"
let last = try log.last() // "event_3"
let val = try log.get(at: 1) // "event_2"
log.count // 3Mutate and delete by index:
log = try log.mutating(at: 1, value: "event_2_updated")
log = try log.deleting(at: 0) // swaps with last element to maintain contiguous indicesConcatenate two arrays:
var batch = MerkleArrayImpl<String>()
batch = try batch.append("new_1").append("new_2")
log = try log.append(contentsOf: batch)Range queries use .range as a ResolutionStrategy, resolved via resolve(paths:). Fetches O(k) nodes, not O(n):
let header = HeaderImpl(node: log)
try header.storeRecursively(storer: myStore)
let resolved = try await HeaderImpl<MerkleArrayImpl<String>>(rawCID: header.rawCID)
.resolve(fetcher: myFetcher)
// Load only indices 10..<20 from a 1000-element array
let page = try await resolved.node!.resolve(
paths: MerkleArrayImpl<String>.rangePaths(10..<20), fetcher: myFetcher
)Build custom resolution paths for full control:
var paths = ArrayTrie<ResolutionStrategy>()
paths.set([], value: .range(10..<20)) // expands to .targeted for each index
paths.set([MerkleArrayImpl<String>.binaryKey(15)], value: .recursive) // override: recursively resolve index 15
let page = try await resolved.node!.resolve(paths: paths, fetcher: myFetcher)Values can be Header types wrapping inner MerkleArrays. Range queries chain through both levels — .range at an outer key flows through to the inner array:
typealias InnerArray = MerkleArrayImpl<String>
typealias OuterArray = MerkleArrayImpl<HeaderImpl<InnerArray>>
// Resolve outer[2..<5] and for each, resolve inner[0..<10]
let page = try await outer.resolve(
paths: OuterArray.rangePaths(2..<5, innerStrategy: .range(0..<10)),
fetcher: myFetcher
)Apply transforms across a range of nested arrays:
let innerTransforms: [[String]: Transform] = [
[InnerArray.binaryKey(3)]: .update("new_value")
]
// Updates index 3 in every inner array at outer indices 0..<4
let result = try outer.transformNested(
outerRange: 0..<4, innerTransforms: innerTransforms
)Or build the full path manually for selective transforms:
var transforms = ArrayTrie<Transform>()
transforms.set([OuterArray.binaryKey(0), InnerArray.binaryKey(2)], value: .update("changed"))
transforms.set([OuterArray.binaryKey(1), InnerArray.binaryKey(5)], value: .delete)
let result = try outer.transform(transforms: transforms)A membership-only collection backed by MerkleDictionary with empty string sentinels as values. Supports standard set operations.
import cashew
var set = MerkleSetImpl()
set = try set.insert("alice")
set = try set.insert("bob")
set = try set.insert("charlie")
try set.contains("alice") // true
try set.contains("dave") // false
set.count // 3
let members: Set<String> = try set.members() // {"alice", "bob", "charlie"}
set = try set.remove("bob")
try set.contains("bob") // falseSet operations:
let a = try MerkleSetImpl().insert("alice").insert("bob")
let b = try MerkleSetImpl().insert("bob").insert("charlie")
let union = try a.union(b) // {"alice", "bob", "charlie"}
let intersection = try a.intersection(b) // {"bob"}
let difference = try a.subtracting(b) // {"alice"}
let symDiff = try a.symmetricDifference(b) // {"alice", "charlie"}MerkleSet inherits all MerkleDictionary capabilities — content addressability, resolution, encryption, proofs, and transforms all work unchanged.
MerkleDictionary supports sorted key iteration with cursor-based pagination by traversing the radix trie in lexicographic order.
var dict = MerkleDictionaryImpl<String>()
dict = try dict.inserting(key: "cherry", value: "3")
dict = try dict.inserting(key: "apple", value: "1")
dict = try dict.inserting(key: "banana", value: "2")
let sorted = try dict.sortedKeys() // ["apple", "banana", "cherry"]
let pairs = try dict.sortedKeysAndValues() // [(key: "apple", value: "1"), ...]Paginate with limit and after:
let page1 = try dict.sortedKeys(limit: 2) // ["apple", "banana"]
let page2 = try dict.sortedKeys(limit: 2, after: "banana") // ["cherry"]
let valuePage = try dict.sortedKeysAndValues(limit: 10, after: lastKey)Cashew supports AES-GCM encryption for content-addressable headers. The CID becomes a hash of the ciphertext, so content-addressing guarantees still hold.
import Crypto
let key = SymmetricKey(size: .bits256)
let scalar = MyScalar(value: 42)
let encHeader = try HeaderImpl(node: scalar, key: key)
encHeader.encryptionInfo?.keyHash // base64(SHA256(key))
encHeader.encryptionInfo?.iv // base64(random nonce)Fetchers must conform to KeyProvider to look up keys by hash:
class MyStoreFetcher: Storer, Fetcher, KeyProvider {
private var storage: [String: Data] = [:]
private var keys: [String: SymmetricKey] = [:]
func registerKey(_ key: SymmetricKey) {
let keyData = key.withUnsafeBytes { Data($0) }
let hash = Data(SHA256.hash(data: keyData)).base64EncodedString()
keys[hash] = key
}
func key(for keyHash: String) -> SymmetricKey? { keys[keyHash] }
func store(rawCid: String, data: Data) { storage[rawCid] = data }
func fetch(rawCid: String) async throws -> Data {
guard let data = storage[rawCid] else { throw MyError.notFound }
return data
}
}
let fetcher = MyStoreFetcher()
fetcher.registerKey(key)
let encHeader = try HeaderImpl(node: scalar, key: key)
try encHeader.storeRecursively(storer: fetcher)
let cidOnly = HeaderImpl<MyScalar>(
rawCID: encHeader.rawCID,
node: nil,
encryptionInfo: encHeader.encryptionInfo
)
let resolved = try await cidOnly.resolve(fetcher: fetcher)
resolved.node! // decrypted scalarEncryption strategies are specified via ArrayTrie<EncryptionStrategy>:
.targeted(key)— Encrypts only the value at a specific path..list(key)— Encrypts the trie structure (RadixHeaders). Keys aren't enumerable without the key..recursive(key)— Encrypts everything — structure and values — with the same key.
var dict = MerkleDictionaryImpl<HeaderImpl<MyScalar>>()
dict = try dict.inserting(key: "public-field", value: HeaderImpl(node: MyScalar(value: 1)))
dict = try dict.inserting(key: "secret-field", value: HeaderImpl(node: MyScalar(value: 2)))
let header = HeaderImpl(node: dict)
var encryption = ArrayTrie<EncryptionStrategy>()
encryption.set(["secret-field"], value: .targeted(key))
let encrypted = try header.encrypt(encryption: encryption)
let publicVal = try encrypted.node!.get(key: "public-field")!
publicVal.encryptionInfo // nil — plaintext
let secretVal = try encrypted.node!.get(key: "secret-field")!
secretVal.encryptionInfo // non-nil — encryptedDifferent keys for different branches:
let teamKey = SymmetricKey(size: .bits256)
let aliceKey = SymmetricKey(size: .bits256)
var encryption = ArrayTrie<EncryptionStrategy>()
encryption.set([""], value: .recursive(teamKey))
encryption.set(["alice"], value: .recursive(aliceKey))
let encrypted = try header.encrypt(encryption: encryption)
// "bob" uses teamKey; "alice" and descendants use aliceKeySupply a KeyProvider to preserve encryption when transforming:
var transforms = ArrayTrie<Transform>()
transforms.set(["alice"], value: .delete)
let result = try encrypted.transform(transforms: transforms, keyProvider: fetcher)| Transform | Encryption behavior |
|---|---|
| Delete (value removed) | Header is gone — nothing to preserve |
| Delete (trie restructured) | Surviving headers keep their encryption |
| Update | Header re-encrypted with same key, new IV |
| Insert | New header is not auto-encrypted |
| No keyProvider | Encryption is stripped (backward compatible) |
To encrypt newly inserted content, use the combined transform+encrypt overload:
let result = try header.transform(
transforms: transforms,
encryption: encryption,
keyProvider: fetcher
)let enc = try HeaderImpl(node: scalar, key: key)
enc.description
// "enc:abc123...==:def456...==:baguqeer..."
// keyHash iv rawCID
let restored = HeaderImpl<MyScalar>(enc.description)!
restored.rawCID == enc.rawCID // trueSee Encryption README for the full reference.
Cashew includes a simple query language for reading and transforming data structures via strings. Queries return a (Self, CashewResult) tuple — the possibly-modified structure and the result of the last expression.
var dict = MerkleDictionaryImpl<String>()
// Build up data and query in one pipeline
let (updated, result) = try dict.query("""
insert "alice" = "engineer" | insert "bob" = "designer" | keys sorted
""")
// result == .list(["alice", "bob"])
// updated has both keys inserted| Command | Example | Result |
|---|---|---|
get "key" |
get "alice" |
.value("engineer") or .node(child) if the value is a Header |
keys |
keys |
.list(["alice", "bob"]) |
keys sorted |
keys sorted limit 5 after "a" |
.list([...]) |
values |
values |
.entries([...]) |
values sorted |
values sorted limit 10 |
.entries([...]) |
count |
count |
.count(2) |
contains "key" |
contains "alice" |
.bool(true) |
insert "key" = "val" |
insert "alice" = "engineer" |
.ok |
update "key" = "val" |
update "alice" = "lead" |
.ok |
set "key" = "val" |
set "alice" = "cto" |
.ok (insert or update) |
delete "key" |
delete "alice" |
.ok |
Array-specific (on MerkleArray):
| Command | Example | Result |
|---|---|---|
get at <n> |
get at 0 |
.value("first") |
first |
first |
.value("first") |
last |
last |
.value("last") |
append "val" |
append "new" |
.ok |
Aliases: members = keys, size = count, has = contains, add = insert, remove/delete, put = set.
Chain expressions with | to build up data and query it:
let (_, result) = try MerkleDictionaryImpl<String>().query("""
insert "cherry" = "3" | insert "apple" = "1" | insert "banana" = "2" | keys sorted limit 2
""")
// result == .list(["apple", "banana"])let (arr, result) = try MerkleArrayImpl<String>().query("""
append "hello" | append "world" | first
""")
// result == .value("hello"), arr.count == 2When get returns a .node (the value is a Header), subsequent pipeline steps execute on the child:
typealias InnerDict = MerkleDictionaryImpl<String>
typealias OuterDict = MerkleDictionaryImpl<HeaderImpl<InnerDict>>
let inner = try InnerDict().inserting(key: "role", value: "admin").inserting(key: "dept", value: "eng")
let outer = try OuterDict().inserting(key: "profile", value: HeaderImpl(node: inner))
let (_, count) = try outer.query(#"get "profile" | count"#)
// count == .count(2) — navigated into the inner dict
let (_, role) = try outer.query(#"get "profile" | get "role""#)
// role == .value("admin") — chained two levels deepThe query language works with any Node, not just MerkleDictionary or MerkleArray. Every Node automatically supports get, count, keys, keys sorted, contains, and set through the default evaluate(_:) implementation, which uses the three core Node methods (properties(), get(property:), set(properties:)). Transforms (insert, update, delete) are compiled into ArrayTrie<Transform> steps and applied via transform(transforms:).
Pipeline chaining works automatically on any custom Node. The executor uses get(property:) directly for mid-pipeline navigation, so get "key" | ... chains into children without requiring any overrides:
struct GameState: Node {
var players: [String: HeaderImpl<PlayerStats>] = [:]
func get(property: PathSegment) -> (any Header)? { players[property] }
func properties() -> Set<PathSegment> { Set(players.keys) }
func set(properties: [PathSegment: any Header]) -> Self {
var copy = self
copy.players = properties.compactMapValues { $0 as? HeaderImpl<PlayerStats> }
return copy
}
}
var state = GameState(/* ... */)
let (_, result) = try state.query("count") // .count(N)
let (_, result) = try state.query(#"contains "alice""#) // .bool(true)
let (_, result) = try state.query(#"get "alice" | count"#) // .count(N) on alice's PlayerStats
let (_, result) = try state.query(#"get "alice" | keys"#) // .list([...]) of PlayerStats keysTo customize terminal expression behavior, override evaluate(_:) and call defaultEvaluate(_:) for cases you don't handle:
extension GameState {
func evaluate(_ expression: CashewExpression) throws -> (Self, CashewResult) {
switch expression {
case .count:
return (self, .value("players: \(players.count)"))
default:
return try defaultEvaluate(expression)
}
}
}defaultEvaluate(_:) provides the full default implementation (get, count, keys, keys sorted, contains, set). Pipeline navigation works regardless of whether you override evaluate — the executor handles it at the plan execution level.
Headers also delegate queries to their wrapped node, so HeaderImpl<GameState> automatically supports the same queries:
let header = HeaderImpl(node: state)
let (_, result) = try header.query("count") // delegates to state.query("count")
// With a fetcher, the header resolves itself before executing
let (_, result) = try await unresolvedHeader.query("count", fetcher: myFetcher)Three levels of custom nodes — Fleet → Server → service config — queried with the same pipeline syntax:
struct Server: Node {
var services: [String: HeaderImpl<MerkleDictionaryImpl<String>>] = [:]
func get(property: PathSegment) -> (any Header)? { services[property] }
func properties() -> Set<PathSegment> { Set(services.keys) }
func set(properties: [PathSegment: any Header]) -> Self { /* ... */ }
}
struct Fleet: Node {
var hosts: [String: HeaderImpl<Server>] = [:]
func get(property: PathSegment) -> (any Header)? { hosts[property] }
func properties() -> Set<PathSegment> { Set(hosts.keys) }
func set(properties: [PathSegment: any Header]) -> Self { /* ... */ }
}Once built, the entire hierarchy is explorable with queries:
let fleet = Fleet(hosts: [
"web-1": HeaderImpl(node: Server(services: [
"nginx": HeaderImpl(node: try config(("port", "443"), ("status", "healthy"))),
"app": HeaderImpl(node: try config(("port", "8080"), ("version", "3.1.0"))),
])),
"db-1": HeaderImpl(node: Server(services: [
"postgres": HeaderImpl(node: try config(("port", "5432"), ("role", "primary"))),
])),
// ...
])
try fleet.query("keys sorted") // .list(["db-1", "web-1"])
try fleet.query("count") // .count(2)
try fleet.query(#"get "web-1" | keys sorted"#) // .list(["app", "nginx"])
try fleet.query(#"get "web-1" | get "nginx" | get "status""#) // .value("healthy")
try fleet.query(#"get "db-1" | get "postgres" | get "role""#) // .value("primary")
try fleet.query(#"get "db-1" | contains "postgres""#) // .bool(true)
try fleet.query(#"get "web-1" | get "nginx" | values sorted"#) // .entries([("port","443"), ...])
try fleet.query("keys sorted limit 1") // .list(["db-1"])
try fleet.query(#"keys sorted after "db-1""#) // .list(["web-1"])No overrides needed — Fleet and Server only implement the three Node requirements (get(property:), properties(), set(properties:)), and pipelines, pagination, containment checks, and deep traversal all work automatically.
Queries compile into a CashewPlan that batches consecutive transforms into a single ArrayTrie<Transform>, applied in one pass via the existing transform(transforms:) machinery. Reads flush pending transforms first.
// These three inserts compile into ONE ArrayTrie<Transform> step, not three separate tree rebuilds
let (dict, _) = try MerkleDictionaryImpl<String>().query("""
insert "a" = "1" | insert "b" = "2" | insert "c" = "3"
""")For lazy-loaded data, the plan computes resolution paths automatically. Use the async variant with a fetcher to resolve before executing:
// Computes resolution paths: ["alice"] → .targeted
// Resolves needed nodes from store, then executes the query
let (_, result) = try await unresolvedDict.query(#"get "alice""#, fetcher: myFetcher)You can also inspect or use the plan directly:
let expressions = try CashewParser.parse(#"insert "a" = "1" | get "a""#)
let plan = CashewPlan.compile(expressions)
plan.steps // [.transform(trie), .evaluate(.get("a"))]
plan.resolutionPaths() // ArrayTrie with ["a"] → .targetedThe test suite includes scenario-based tests that demonstrate practical usage patterns. See Tests/cashewTests/RealWorldTests.swift for the full implementations.
Model user profiles as a MerkleDictionaryImpl<String>, exercise the full lifecycle (insert, query, update, delete), and verify structural sharing — unchanged user records keep the same CIDs across versions.
Store configuration as versioned snapshots. Each version gets its own root CID. Roll back to any historical version by resolving its CID.
Use targeted encryption on MerkleDictionaryImpl<HeaderImpl<TestScalar>> to encrypt private fields per-record while keeping public fields readable by anyone. Authorized resolvers (with the key) read private data; unauthorized resolvers get DataErrors.keyNotFound.
Build a 3-level hierarchy (catalog → categories → products). Demonstrate resolution strategies: .list loads category structure with values still CID-only, .targeted resolves a single category, .recursive loads an entire subtree.
Use MerkleArrayImpl<String> as an append-only event log. Each append produces a new root CID. Range queries enable pagination — load only the events you need. Structural sharing means appending one event only changes the trie branch for the new index.
Model multi-sensor time-series as MerkleArrayImpl<HeaderImpl<MerkleArrayImpl<String>>> — an outer array of sensors, each containing an inner array of readings. Chained range queries load readings 5..10 from sensors 1..3 without touching the rest. Nested transforms apply calibration updates across all sensors in one pass.
Combine MerkleDictionary (keyed by channel name) with MerkleArray (messages per channel). Pagination loads only the last N messages. Every message append produces a new CID for auditable history.
Build a transaction ledger and generate existence/insertion/deletion proofs. Proofs from CID-only headers are minimal — only the path to the target key is materialized; unrelated branches stay as CID stubs.
| Protocol | Conforms To | Purpose |
|---|---|---|
Node |
Codable, LosslessStringConvertible, Sendable |
Base for all Merkle structures. Defines get, set, resolve, transform, proof, storeRecursively. |
Header |
Codable, Sendable, LosslessStringConvertible |
Wraps a Node with its CID and optional EncryptionInfo. Supports resolve, proof, transform, store, encrypt. Can be resolved or unresolved. |
RadixNode |
Node |
Compressed trie node with prefix, optional value, and children. |
RadixHeader |
Header |
Header constrained to RadixNode. |
MerkleDictionary |
Node |
Top-level key-value map. Dispatches by first character to RadixHeader children. Supports cursor-based sorted iteration. |
MerkleArray |
Node |
Ordered append-only collection backed by MerkleDictionary with UInt256 binary keys. |
MerkleSet |
MerkleDictionary |
Membership-only set backed by MerkleDictionary with empty string sentinels. Set operations: union, intersection, subtracting, symmetricDifference. |
Scalar |
Node |
Leaf node with no children. Returns empty for properties(). |
Fetcher |
Sendable |
Async data retrieval by CID. One method: fetch(rawCid:) async throws -> Data. |
Storer |
-- | Data persistence by CID. One method: store(rawCid:data:) throws. |
KeyProvider |
-- | Key lookup by hash. One method: key(for:) -> SymmetricKey?. |
KeyProvidingFetcher |
Fetcher, KeyProvider |
Combined fetcher + key provider for resolving encrypted data. |
Volume |
Header |
Marks a point in the DAG where peers may store child blocks contiguously. Overrides resolve to call provide on the fetcher. |
VolumeAwareFetcher |
Fetcher |
Receives provide(rootCID:paths:) calls when resolution enters a Volume. |
| Enum | Cases | Purpose |
|---|---|---|
ResolutionStrategy |
.targeted, .recursive, .list, .range(after:limit:) |
Controls how deep resolution goes |
Transform |
.insert(String), .update(String), .delete |
Mutation operations for transforms |
SparseMerkleProof |
.insertion, .mutation, .deletion, .existence |
Proof types for sparse Merkle proofs |
EncryptionStrategy |
.targeted(key), .list(key), .recursive(key) |
Controls what gets encrypted at each path |
Compare two versions of any data structure to see what changed. Identical subtrees are skipped by CID comparison, so cost is proportional to the number of changes, not the size of the tree.
let v1 = try MerkleDictionaryImpl<String>()
.inserting(key: "alice", value: "engineer")
.inserting(key: "bob", value: "designer")
let v2 = try v1
.deleting(key: "bob")
.deleting(key: "alice").inserting(key: "alice", value: "lead")
.inserting(key: "carol", value: "intern")
let diff = try v2.diff(from: v1)
print(diff)
// + carol: intern
// - bob: designer
// ~ alice: engineer → lead
diff.inserted // ["carol": "intern"]
diff.deleted // ["bob": "designer"]
diff.modified // ["alice": ModifiedEntry(old: "engineer", new: "lead")]
diff.changeCount // 3For MerkleDictionary, diff operates at the logical key level — you see actual key names and values, not internal trie structure. For nested structures, modified entries include recursive child diffs:
let oldInner = try MerkleDictionaryImpl<String>().inserting(key: "role", value: "ic")
let newInner = try MerkleDictionaryImpl<String>().inserting(key: "role", value: "lead")
let old = try MerkleDictionaryImpl<HeaderImpl<MerkleDictionaryImpl<String>>>()
.inserting(key: "alice", value: HeaderImpl(node: oldInner))
let new = try MerkleDictionaryImpl<HeaderImpl<MerkleDictionaryImpl<String>>>()
.inserting(key: "alice", value: HeaderImpl(node: newInner))
let diff = try new.diff(from: old)
print(diff)
// ~ alice:
// ~ role: ic → leadCustom nodes, Headers, and the async variant all support diffing:
// Node-level diff (structural, uses properties()/get(property:))
let diff = newFleet.diff(from: oldFleet)
// Header diff (short-circuits on matching CIDs)
let diff = try newHeader.diff(from: oldHeader)
// Async: resolves unresolved headers before diffing
let diff = try await newHeader.diff(from: oldHeader, fetcher: myFetcher)| Error | Cases |
|---|---|
DataErrors |
.nodeNotAvailable, .serializationFailed, .cidCreationFailed, .encryptionFailed, .keyNotFound, .invalidIV |
TransformErrors |
.transformFailed(String), .invalidKey(String), .missingData(String) |
ProofErrors |
.invalidProofType(String), .proofFailed(String) |
ResolutionErrors |
.typeError(String) |
CashewDecodingError |
.decodeFromDataError |
| Type | Purpose |
|---|---|
MerkleDictionaryImpl<V> |
Concrete MerkleDictionary. V must be Codable + Sendable + LosslessStringConvertible. |
MerkleArrayImpl<V> |
Concrete MerkleArray. Same constraints as MerkleDictionaryImpl. |
MerkleSetImpl |
Concrete MerkleSet. Uses String values with empty string sentinel. |
RadixNodeImpl<V> |
Concrete RadixNode with JSON coding for Character-keyed children. |
HeaderImpl<N> |
Generic Header wrapping any Node type. |
RadixHeaderImpl<V> |
RadixHeader for RadixNodeImpl<V>. |
VolumeImpl<N> |
Concrete Volume wrapping any Node type. |
EncryptionInfo |
Metadata on an encrypted header (keyHash + iv). |
┌─────────────┐
│ Fetcher │ (pluggable: IPFS, DB, filesystem)
│ fetch(cid) │
└──────┬──────┘
│ Data
▼
┌──────────┐ resolve ┌─────────┐ transform ┌──────────┐
│Unresolved├───────────>│Resolved ├──────────────>│ New │
│ Header │ │ Header │ │ Header │
│(CID only)│ │(CID+Node) │(new CID) │
└──────────┘ └────┬────┘ └────┬─────┘
│ │
│ proof encrypt │ storeRecursively
▼ │ ▼
┌──────────┐ │ ┌─────────────┐
│ Proof │ └───>│ Storer │
│(minimal │ │ + KeyProvider│
│ subtree) │ │ store(cid, │
└──────────┘ │ data) │
└─────────────┘
Node (base: Codable + LosslessStringConvertible + Sendable)
|-- Scalar (leaf node, no children)
|-- RadixNode (compressed trie node)
|-- MerkleDictionary (top-level key-value map)
|-- MerkleArray (ordered collection, backed by MerkleDictionary)
|-- MerkleSet (membership set, backed by MerkleDictionary)
Header (Codable + Sendable, wraps a Node with its CID + optional EncryptionInfo)
|-- RadixHeader (Header constrained to RadixNode)
|-- Volume (marks a data locality boundary; notifies fetcher before resolving)
Fetcher (Sendable, retrieves blocks by CID)
|-- VolumeAwareFetcher (receives provide(rootCID:paths:) when entering a Volume)
Concrete implementations: MerkleDictionaryImpl<V>, MerkleArrayImpl<V>, MerkleSetImpl, RadixNodeImpl<V>, HeaderImpl<N>, RadixHeaderImpl<V>, VolumeImpl<N>.
A MerkleDictionary maps string keys to values by dispatching on the first character to a radix trie branch:
MerkleDictionary { count: 4 }
'a' -> RadixHeader -> RadixNode(prefix: "alice", value: "engineer", children: {})
'b' -> RadixHeader -> RadixNode(prefix: "b", value: nil, children: {
'o' -> RadixHeader -> RadixNode(prefix: "ob", value: "designer", children: {})
'a' -> RadixHeader -> RadixNode(prefix: "az", value: "manager", children: {})
})
Path compression stores "alice" as a single node rather than 5 chained nodes — O(k) lookup where k is key length. When serialized, only CIDs are written; resolve headers by fetching from a content-addressable store.
Sources/cashew/
Core/ -- Node, Header protocols + CID creation + encryption helpers
MerkleDataStructures/ -- MerkleDictionary, RadixNode, RadixHeader protocols + concrete impls
Fetcher/ -- Fetcher, Storer, KeyProvider protocols + store extensions
Resolver/ -- Resolution strategies + resolve extensions per type
Transform/ -- Transform operations + insert/delete/mutate/get per type
Encryption/ -- Encryption strategies + encrypt extensions per type
Proofs/ -- Sparse Merkle proof types + proof extensions per type
| Package | Purpose |
|---|---|
| ArrayTrie | Trie data structure for path-based traversal of resolution/transform/proof specs |
| swift-crypto | SHA2-256 hashing for CID generation |
| swift-cid | IPFS CIDv1 content identifiers |
| swift-multicodec | Codec identifiers (dag-json, dag-cbor, etc.) |
| swift-multihash | Self-describing hash format |
| swift-collections | Swift standard collections |
swift testTests cover resolution, transforms, proofs, headers, key enumeration, encryption, arrays, range query performance, volumes, and real-world scenarios.
MIT