Skip to content

gss10282023/ByteTide

Repository files navigation

ByteTide

ByteTide is a LAN/cluster large-file distribution system written in C. It turns single-source pulls into multi-peer, chunked distribution with end-to-end integrity verification.

ByteTide has three components: pkgmake generates .bpkg metadata, btided serves chunks as a seed/peer daemon, and btidectl downloads and verifies end-to-end (chunk SHA256 + Merkle Root).

Features

  • Event-loop + non-blocking I/O (btided / btidectl)
  • Versioned binary framing protocol (framing + codec) that handles partial/coalesced packets
  • Multi-peer pipeline concurrency (configurable)
  • Rarest-first and endgame scheduling
  • Chunk SHA256 verification + final Merkle Root verification
  • Resumable downloads via persisted progress bitset (state.bin)
  • Unit tests (≥ 20 codec edge cases) and Docker integration tests
  • Benchmark scripts that emit structured results (JSON)

Quickstart

Requirements: Docker (recommended; works the same on macOS/Windows/Linux)

./scripts/demo.sh

Demo

You should see VERIFY OK (or an equivalent success marker) at the end.

$ ./scripts/demo.sh
[demo] preparing demo data...
[demo] running docker compose demo...
VERIFY OK
  • Terminal recording (example): assets/demo.cast (play locally with asciinema play assets/demo.cast)
  • Multi-peer rarest-first: ./scripts/demo_rarest.sh

Architecture

flowchart LR
  pkgmake[pkgmake\ngenerate .bpkg] --> meta[.bpkg metadata]
  meta --> btided[btided\ndistribution daemon]
  meta --> btidectl[btidectl\ndownload client]
  btided <-- P2P Frames --> btidectl
  btidectl --> payload[payload.data]
  btidectl --> state[state.bin]
Loading

For a deeper module breakdown and threading model, see docs/architecture.md.

Protocol

ByteTide uses a length-prefixed binary frame protocol (magic/version/msg/len/error + payload) and can recover from TCP packet splitting/coalescing.

  • Frame format, message table, and handshake state machine: docs/protocol.md

Testing

make test-unit
make test-asan
make test-ubsan
make test
make demo-rarest

Benchmark

./bench/run_bench.sh

Field definitions and suggested comparisons: docs/performance.md

Example (from bench/results/examples/; run the script to generate real data):

case concurrency verify_mode throughput_MBps p95_ms p99_ms cpu_pct event_loop_cpu_pct workers_cpu_pct in_flight_requests_peak
c1_pool 1 pool 12.345 3.210 4.567 85.00 20.00 65.00 1
c16_pool 16 pool 48.901 1.234 2.345 92.00 15.00 77.00 16
c16_single 16 single 42.000 1.800 3.000 95.00 95.00 0.00 16

Why ByteTide

When distributing multi-GB/TB artifacts inside a cluster/LAN, common approaches (HTTP/scp/NFS) often run into "central bandwidth bottlenecks + expensive retries + no strong end-to-end proof of correctness". ByteTide uses chunking and multi-peer parallel fetching to make distribution scalable, resumable, and verifiable:

  • Lower central bandwidth: the seed does not need to send the full payload to every machine; peers can complement each other with chunks
  • Interruptible and resumable: download progress is persisted and can continue after disconnects/restarts
  • Provable correctness: chunk SHA256 + final Merkle Root verification validates what is written to disk
  • Reproducible and demoable: scripts and Docker Compose let newcomers run a consistent demo quickly

Use cases

  • Distribute large files to many machines in a cluster/lab network: model weights, datasets, offline installers, release artifacts
  • Many nodes fetch the same version of an artifact and must end up identical (auditable/verifiable)
  • Unreliable or cost-sensitive networks where failures must be recoverable to reduce retransmission
  • A systems/network programming project that needs reproducible demos and measurable comparisons

Non-goals / not a fit

  • Public Internet P2P ecosystems (DHT/Tracker/magnet links) and governance of anonymous nodes at scale
  • Strong security requirements (TLS/auth/access control/encryption) — security is not the current focus
  • Frequent small-change directory sync (rsync/git are a better fit)
  • Small-file distribution on a single machine / a few machines (HTTP/scp is usually enough; ByteTide may be overkill)

Comparison: ByteTide vs HTTP/rsync/BitTorrent

Solution Best for Multi-source parallelism Integrity verification Typical trade-offs
ByteTide LAN/cluster large-file distribution, reproducible demos ✅ (peers complement chunks) ✅ (chunk + Merkle Root) Focused scope; not a “public BitTorrent replacement”
HTTP / object storage Simple distribution, strong ecosystem ❌ (usually single source) Requires external checksums Seed bandwidth bottlenecks; retries can re-transfer data
rsync Directory sync, incremental updates ❌ (usually single source) ✅ (rolling/file checksums) Great for small diffs; less optimal for full large-file distribution
BitTorrent Public large-scale distribution, mature ecosystem ✅ (piece hashes) More moving parts (tracker/DHT/strategy/compat); overkill for LAN demos

Releases

Prebuilt Linux binaries are published in GitHub Releases (tags v*).

Roadmap

Milestones and acceptance criteria: DEVELOPMENT_PLAN.md

License

MIT; see LICENSE.

About

ByteTide is a LAN/cluster large-file distribution system written in C. It turns single-source pulls into multi-peer, chunked distribution with end-to-end integrity verification.

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors