Skip to content

ericbsantana/kraken-book

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

5 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

kraken-book

A live L2 order-book reconstructor for Kraken's WebSocket v2 feed, written in async Rust.

It connects to wss://ws.kraken.com/v2, ingests snapshot + delta updates for a configurable set of symbols, maintains per-symbol books that self-validate against Kraken's CRC32 checksum, and exposes the state via an HTTP API with both polled snapshots and live SSE streams. It reconnects on network failure, detects half-open connections via read timeouts, and shuts down cleanly on Ctrl-C.

Built as a hands-on tour of production-shaped async Rust. The project is deliberately small (a few hundred lines of real code) but covers the full surface area of tokio: spawning, channels, locks, fanout, cancellation, structured shutdown, HTTP, observability.

kraken-book TUI demo


Demo

cargo run --bin connect

In a second terminal:

# List subscribed symbols
curl http://localhost:3000/symbols

# Current top-of-book snapshot
curl http://localhost:3000/depth/BTC%2FUSD | jq

# Live SSE stream of updates
curl -N http://localhost:3000/stream/BTC%2FUSD

Or run the terminal UI (live ladder per symbol, q / Esc / Ctrl-C to quit):

cargo run --bin tui

Sample SSE event:

data: {"symbol":"BTC/USD","kind":"update",
       "best_bid":{"price":"81409.1","qty":"6.35417741"},
       "best_ask":{"price":"81409.2","qty":"0.00083891"},
       "spread":"0.1"}

Try Ctrl-C while messages flowclean structured shutdown. Try toggling wifi off for ~12sread-timeout fires, exponential backoff kicks in, reconnect succeeds, in-flight HTTP clients keep working.


Architecture

                       Kraken WS v2 feed
                              β”‚
                              β–Ό
                      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                      β”‚ run_session() β”‚  reconnect loop, exp backoff,
                      β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜  10s read timeout
                              β”‚ parse + dispatch
                              β”‚
              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
              β–Ό               β–Ό               β–Ό
        mpsc::channel   mpsc::channel   mpsc::channel    (one per symbol,
              β”‚               β”‚               β”‚           bounded N=64)
              β–Ό               β–Ό               β–Ό
      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
      β”‚ consume(BTC) β”‚ β”‚ consume(ETH) β”‚ β”‚ consume(SOL) β”‚  consumer task
      β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜  (sole writer of
              β”‚                β”‚                β”‚          its book +
       β”Œβ”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”         β”‚                β”‚          sole publisher
       β”‚             β”‚         β”‚                β”‚          of its bus)
       β–Ό             β–Ό         β–Ό                β–Ό
   RwLock<Book>  broadcast  RwLock<Book> ...  RwLock<Book>
   (state, for   ::Sender   (state)           (state)
    /depth)      <Event>
                 (events,
                  for /stream)
                     β”‚
                     β”‚  N receivers per symbol, one per subscribed
                     β–Ό  SSE client; slow clients get Lagged(n) and
              GET /stream/  drop messages, never block the producer
                     β–²
                     β”‚
              axum HTTP server (graceful shutdown via CancellationToken)
                     β”‚
                     β”‚  /symbols      /depth/:symbol     /stream/:symbol
                     β–Ό
                  curl / browser EventSource

Three communication patterns coexist, each chosen for a different reason:

Primitive Used for Why
mpsc::channel(N) WS β†’ consumer Bounded backpressure: if a consumer falls behind, the sender blocks. We want zero data loss in the ingest path.
Arc<RwLock<OrderBook>> Book state shared with HTTP Many concurrent readers (/depth), exactly one writer (the consumer task). Per-symbol locks so an ETH update doesn't block BTC reads.
broadcast::channel(N) consumer β†’ SSE clients Fanout. Slow client β†’ Lagged(n) and drops messagesnever blocks the producer or other clients. The right semantics for live market data.

The choice between mpsc and broadcast is the most consequential design decision in any pub/sub system: what happens when consumers can't keep up? mpsc says "block the producer"; broadcast says "drop oldest for stragglers." This service uses both, in different layers, deliberately.

How to run it

cargo run --bin connect

# Run the test suite:
cargo test

Tests cover JSON parsing, book apply semantics, the Kraken-specific CRC32 format (verified against an example string from Kraken's docs), and decimal precision preservation through serde.


The learning journey

This project was built in nine focused sessions. Each session introduced a small set of async concepts paired with a concrete, working slice of the system.

# Topic Key concepts
1 Project setup, WS connection tokio-tungstenite, Stream/Sink, TLS in the Rust ecosystem
2 Subscribe, parse messages serde untagged + internally-tagged enums, Decimal precision
3 Move the book into its own task Actor pattern, mpsc, ownership across .await, SendError and recovering ownership
4 L2 book semantics BTreeMap, snapshot vs. delta, zero-qty as remove sentinel, top-N invariant
5 CRC32 self-validation The Kraken algorithm, exact decimal formatting, debugging a checksum mismatch (the bug was missing client-side trim)
6 Reconnect + graceful shutdown tokio::select!, cancel safety, CancellationToken, exponential backoff, structured shutdown via ownership, tokio::time::timeout for half-open sockets
7 HTTP API axum, Arc<RwLock<>>, the tokio::sync::* vs. std::sync::* distinction, the cardinal sin of holding locks across .await
8 Live SSE push tokio::sync::broadcast, Lagged(n) semantics, BroadcastStream adapter, axum SSE with KeepAlive
9 Observability tracing, #[instrument], structured fields, RUST_LOG-tunable filtering, TraceLayer

Each source file is heavily annotated with the why behind the design. The comments are the textbook. Read src/bin/connect.rs, src/api.rs, src/book.rs top-to-bottom and you'll have a tour of every concept above with running code attached.


How AI was used

I built this project paired with Claude (Anthropic) acting as a tutor. The format was deliberate and the same every session:

  1. I'd ask for a small concept primerfive-ish concepts framed in terms of what they solve, not how they work mechanically.
  2. The AI scaffolded the boring partsfile structure, imports, function signatures, doc comments explaining the why of each design decision. Every load-bearing logic point was left as a todo!() for me with a sketch in comments.
  3. I wrote the load-bearing logic myselfthe select! body, the reconnect loop, the RwLock acquisition, the SSE handler, the broadcast publish. The AI wrote zero of the actual concurrency primitives.
  4. The AI explained compiler errors as mental modelswhen I hit "cannot move out of dereference of RwLockReadGuard," the answer wasn't "use .clone()" but a paragraph on smart pointers, Copy, and why this rule has nothing to do with tokio. Those mental-model explanations are what stuck.
  5. I drove all decisions and verification. Backoff strategy, what to log, when to commit, what to put in this READMEthose were mine. Every session ended with cargo run + manual verification (Ctrl-C, wifi toggling, multiple curl clients) before moving on.

Why I'm explicit about this: working effectively with AI is a real engineering skill in 2026, and pretending I built this alone would misrepresent how I actually work. What I learned, I learned for keepsthe proof is that I can explain why every primitive is here, defend every design decision, and walk through the code without notes. The AI accelerated the teaching, not the understanding.

The extensive inline comments in this repo are partly AI-authored (explanations) and partly mine (the comments on the lines I wrote). They're left in place because they make the project useful as a reference for someone else learning the same material.

What's intentionally not here

This is a learning project, not a production exchange feed handler. Things a real one would add and this one doesn't:

  • Sequence-number tracking and gap detection (Kraken v2 doesn't expose seq numbers per book event; production systems use it where available)
  • Persistence / replay
  • Auth / TLS termination on the HTTP side (binds to 127.0.0.1:3000; production would terminate TLS at a reverse proxy)
  • Symbol allowlist on subscribe, rate limiting on the HTTP endpoints
  • Metrics export (Prometheus)the tracing instrumentation is in place; a metrics crate hookup is the natural next step
  • Benchmarks (criterion) for the order-book hot path under contention

The code is structured so each of these is a localized addition rather than a rewrite.


How to expand this

More symbols, more venues

Symbols are currently a const per binary; moving them to env config (e.g., KRAKEN_SYMBOLS=BTC/USD,ETH/USD,...) is a one-line change. The rest of the architecture is symbol-agnostic. For multiple venues (Coinbase, Binance, etc.), abstract run_session behind a Venue traitsame consumer pipeline, many feeds.

Multiple instances behind a load balancer

HTTP /depth and /symbols are stateless readsfine behind any load balancer with no concerns. Each instance maintains its own copy of the books from its own WebSocket connection (a few KB per symbol per instance, cheap). Instances can disagree by a few milliseconds due to network jitter; for L2 displays this is acceptable, for trading you'd need stricter consistency (sequence numbering + replay).

/stream/:symbol (SSE) needs sticky sessions at the LB layer (ip_hash in nginx, source-IP affinity in HAProxy / AWS ALB). Each broadcast Receiver lives on the instance the client first connected to; bouncing across instances on reconnect would give the client a different starting position in the feed.

Cross-instance fanout

For very high SSE client counts or multi-region deployments, replace the per-instance broadcast with a published feed (NATS, Redis Streams, Kafka). Each instance becomes a thin SSE proxy over the shared bus rather than maintaining its own books. The current architecture already separates state from fanout, so this is a localized swap rather than a rewrite.

Production hardening

  • Persistence: periodic snapshots of each book to disk for fast restart without re-syncing from a snapshot frame.
  • Auth + rate limits: tower::limit and an auth middleware layer on the axum router.
  • Containerization: multi-stage Dockerfile with a distroless final image; a Helm chart or Compose file for ops.
  • Metrics: tracing instrumentation is already in place; tracing-opentelemetry or the metrics crate hooks it into a Prometheus scrape endpoint.

License

MIT.

About

πŸ¦‘ Live L2 order-book reconstructor for Kraken's WebSocket v2 - async Rust crash course with tokio, axum, SSE, and a ratatui TUI

Topics

Resources

License

Stars

Watchers

Forks

Contributors

Languages