A modern replacement for the ZIM file format. Pure Go library and CLI tools for reading, writing, and serving OZA archives.
王座 (oza) -- "throne." OZA takes the throne as the successor to ZIM, with extensible section tables, Zstd compression, SHA-256 integrity, trigram search, and content-addressed deduplication.
ZIM has served the offline content community since 2007, but its design has aged:
- Frozen header -- no extensibility without format hacks
- Namespace overloading -- entry types smuggled into MIME index sentinels
- Single MD5 -- one hash for an entire 90 GB file, no corruption localization
- Xapian search -- 150K lines of C++ with no binary spec, impossible to implement without
libxapian - No content sizes --
Content-Lengthrequires decompressing entire clusters - Four compression formats -- readers must carry zlib, bzip2, XZ, and Zstd
- Chrome entanglement -- HTML assumes a specific application shell at runtime
OZA addresses all of these with a clean-break redesign. See docs/FORMAT.md for the full specification.
| Feature | ZIM | OZA |
|---|---|---|
| Header | Fixed 80 bytes, no extensibility | 128 bytes + section table |
| Entry records | Variable length, 3 pointer indirections | Variable-length (~15 bytes avg), O(1) by ID |
| Content size | Must decompress cluster | blob_size in every entry |
| Compression | XZ/Zstd/zlib/bzip2 | Zstd only + dictionaries |
| Integrity | Single MD5 | SHA-256 at file/section/chunk |
| Search | Opaque Xapian C++ database | Trigram index (fully specified) |
| Deduplication | None | Content-addressed via SHA-256 |
| Signatures | None | Optional Ed25519 |
| Chrome/UI | Mixed with content | Separate optional section |
go get github.com/stazelabs/ozapackage main
import (
"fmt"
"log"
"github.com/stazelabs/oza/oza"
)
func main() {
a, err := oza.Open("archive.oza")
if err != nil {
log.Fatal(err)
}
defer a.Close()
// Read metadata
title, _ := a.Metadata("title")
fmt.Println("Archive:", title)
fmt.Println("Entries:", a.EntryCount())
// Look up an entry by path
entry, err := a.EntryByPath("Main_Page")
if err != nil {
log.Fatal(err)
}
// Read content (resolves redirects automatically)
data, err := entry.ReadContent()
if err != nil {
log.Fatal(err)
}
fmt.Printf("Content-Type: %s\n", entry.MIMEType())
fmt.Printf("Size: %d bytes\n", len(data))
fmt.Printf("Blob size: %d bytes\n", entry.Size()) // no decompression needed
// Iterate all front articles
for e := range a.FrontArticles() {
fmt.Println(e.Path())
}
}package main
import (
"log"
"os"
"github.com/stazelabs/oza/ozawrite"
)
func main() {
f, err := os.Create("output.oza")
if err != nil {
log.Fatal(err)
}
w := ozawrite.NewWriter(f, ozawrite.WriterOptions{
ZstdLevel: 6, // 1=fastest, 6=default, 19=best
BuildSearch: true,
CompressWorkers: 0, // 0 = min(NumCPU, 4)
})
w.SetMetadata("title", "My Archive")
w.SetMetadata("language", "en")
w.SetMetadata("creator", "Example")
w.SetMetadata("date", "2026-03-07")
w.SetMetadata("source", "https://example.com")
id, _ := w.AddEntry("Main_Page", "Main Page", "text/html",
[]byte("<h1>Hello, World</h1>"), true)
w.AddRedirect("Home", "Home", id)
if err := w.Close(); err != nil {
log.Fatal(err)
}
}Dump metadata and section table of an OZA file:
go run ./cmd/ozainfo archive.ozaExtract content from an OZA file:
# Extract an article to stdout
go run ./cmd/ozacat archive.oza Main_Page
# List all entries
go run ./cmd/ozacat -l archive.oza
# Show metadata
go run ./cmd/ozacat -m archive.ozaFull-text trigram search:
go run ./cmd/ozasearch archive.oza "quantum mechanics"Three-tier integrity verification:
# File-level SHA-256 check
go run ./cmd/ozaverify archive.oza
# Full verification (file + section + chunk)
go run ./cmd/ozaverify --all archive.ozaServe OZA files over HTTP:
go run ./cmd/ozaserve -a :8080 archive.ozaStandalone MCP server for LLM agents (see docs/OZAMCP.md):
go run ./cmd/ozamcp archive.ozaGenerate Ed25519 signing key pairs for archive signatures:
go run ./cmd/ozakeygen -o mykey
# Creates mykey.pub and mykey.keyCompare a ZIM file and its OZA conversion side-by-side:
go run ./cmd/ozacmp source.zim converted.oza
# Markdown table output
go run ./cmd/ozacmp --format md source.zim converted.oza
# Deep per-entry comparison
go run ./cmd/ozacmp --deep source.zim converted.ozaConvert ZIM files to OZA format:
go run ./cmd/zim2oza wikipedia.zim wikipedia.oza
# With verbose statistics
go run ./cmd/zim2oza --verbose wikipedia.zim wikipedia.oza
# Dry run (analyze without writing)
go run ./cmd/zim2oza --dry-run wikipedia.zim
# Control parallel compression (default: number of CPUs)
go run ./cmd/zim2oza --compress-workers 4 wikipedia.zim wikipedia.ozaConvert EPUB books to OZA format:
# Single book
go run ./cmd/epub2oza book.epub book.oza
# Collection: bundle all EPUBs in a directory into one searchable archive
go run ./cmd/epub2oza --collection --title "My Library" ./epubs/ library.oza
# With verbose statistics and minification
go run ./cmd/epub2oza --verbose --minify book.epub book.ozaoza.Open(path) (*Archive, error)
oza.OpenWithOptions(path, ...Option) (*Archive, error)
archive.EntryByPath("Main_Page") (Entry, error)
archive.EntryByTitle("Main Page") (Entry, error)
archive.EntryByID(0) (Entry, error)
archive.MainEntry() (Entry, error)
archive.Metadata("title") (string, error)
archive.Entries() iter.Seq[Entry]
archive.EntriesByTitle() iter.Seq[Entry]
archive.FrontArticles() iter.Seq[Entry]
archive.Search("query", SearchOptions{}) ([]SearchResult, error)
archive.Verify() error
archive.VerifyAll() ([]VerifyResult, error)entry.Path() string
entry.Title() string
entry.Size() uint32 // content size without decompression
entry.IsRedirect() bool
entry.IsFrontArticle() bool
entry.MIMEType() string
entry.ReadContent() ([]byte, error) // resolves redirects
entry.Resolve() (Entry, error) // follow redirect chainoza.WithMmap(false) // disable memory mapping
oza.WithCacheSize(32) // chunk cache size (default: 8)
oza.WithVerifyOnOpen() // verify section checksums on openRun all benchmarks:
make benchRun a specific benchmark or subset:
go test -bench=BenchmarkOpen -benchmem ./oza/
go test -bench=BenchmarkWrite -benchmem ./ozawrite/Compare performance across changes with benchstat:
go test -bench=. -benchmem -count=6 ./oza/ ./ozawrite/ > old.txt
# ... make changes ...
go test -bench=. -benchmem -count=6 ./oza/ ./ozawrite/ > new.txt
benchstat old.txt new.txt| Benchmark | What it measures |
|---|---|
BenchmarkOpen |
Header parsing, section loading, index construction |
BenchmarkEntryByPath |
Binary search on path index |
BenchmarkEntryByID |
O(1) entry lookup by numeric ID |
BenchmarkReadContent |
Chunk decompression (cached and uncached sub-benchmarks) |
BenchmarkVerify |
File-level SHA-256 verification |
BenchmarkVerifyAll |
Three-tier integrity check (file + section + entry) |
BenchmarkSearch |
Trigram full-text search |
| Benchmark | What it measures |
|---|---|
BenchmarkWriteSmall |
End-to-end archive creation (100 entries) |
BenchmarkWriteMedium |
End-to-end archive creation (10K entries) |
BenchmarkWriteWithDict |
Archive creation with dictionary training (500 entries) |
BenchmarkCompressChunk |
Zstd compression throughput (64 KB chunk) |
BenchmarkTrainDictionary |
Zstd dictionary training from HTML samples |
BenchmarkBuildTrigramIndex |
Trigram index construction (1K entries, in-memory) |
BenchmarkBuildTrigramIndexLarge |
Trigram index construction (5K entries, disk spilling) |
make bench-convert # convert small.zim (downloads test data)
make bench-convert-large ZIM=/path/to.zim # convert a large ZIM filemake test # run tests
make test-race # run with race detector
make bench # run benchmarks
make testdata # download test files
make build # build all CLI toolsApache 2.0 -- see LICENSE for details.