+
+ {/* Page Header */}
+
+
+ DRMQ v1.0
+
+
Documentation
+
+ Complete technical reference for the Distributed Reliable Message Queue — a Raft-based,
+ append-only event streaming system with linearizable guarantees and a custom storage engine.
+
+
+
+ {/* ── Section 1: Introduction ─────────────────────────────────── */}
+
+
+ DRMQ (Distributed Reliable Message Queue) is a from-scratch implementation of a distributed
+ event streaming broker built on the Raft consensus algorithm. It is designed for workloads
+ that demand strict, globally ordered message delivery with durability guarantees.
+
+
+ Unlike partition-based models, DRMQ replicates the entire topic log across all nodes
+ in the Raft cluster. This sacrifices horizontal write scalability in exchange for
+ much simpler operational semantics — every message is linearizable, every consumer sees the
+ same global order, and there is no concept of partition reassignment or consumer rebalancing.
+
+
+
+
+ {[
+ ['Strictly ordered event logs', 'Audit trails, state-machine replication.'],
+ ['Small-to-medium throughput', 'Single-partition write path means writes serialize through the Raft leader.'],
+ ['Simple operational model', 'No partition maps, no ISR, no consumer group rebalancing protocol.'],
+ ['High availability with 3+ nodes', 'Survives minority node failures with automatic leader failover.'],
+ ].map(([title, desc]) => (
+
+ {title}
+ {desc}
+
+ ))}
+
+
+
+ A message is only acknowledged to the producer after a quorum (majority) of
+ Raft nodes have durably written it to their local log. A single node failure cannot cause
+ data loss for any acknowledged message.
+
+
+ {/* ── Section 2: Architecture ─────────────────────────────────── */}
+
+
+ DRMQ is structured in three independent layers that interact through well-defined internal
+ interfaces. Understanding these boundaries is critical for operational and debugging work.
+
+
+
+
+ {[
+ {
+ name: 'Protocol Layer',
+ color: 'cyan',
+ desc: `All client and peer communication uses a single Netty-based TCP server. Messages are
+ framed with a 4-byte length prefix and encoded as Protobuf binary. Client requests
+ (ProduceRequest, ConsumeRequest, CommitOffsetRequest) and Raft RPCs
+ (AppendEntries, RequestVote, InstallSnapshot) share the same transport. The Netty
+ LengthFieldBasedFrameDecoder is configured with a 256MB maximum frame size to
+ accommodate large snapshots.`,
+ },
+ {
+ name: 'Consensus Layer',
+ color: 'emerald',
+ desc: `RaftNode implements the full Raft protocol: leader election with Pre-Vote, log
+ replication with batched AppendEntries (capped at MAX_ENTRIES_PER_RPC = 500 entries),
+ InstallSnapshot for severely lagging followers, and durable persistence of currentTerm,
+ votedFor, commitIndex, and lastApplied across restarts. The consensus layer is the
+ gatekeeper — no write reaches the storage layer without first being committed by a quorum.`,
+ },
+ {
+ name: 'Storage Layer',
+ color: 'purple',
+ desc: `MessageStore manages the on-disk topic data using a segmented, append-only log.
+ Each topic is a directory of .log files (100MB each) with corresponding .idx sparse
+ index files. The RaftLog itself uses a separate binary-encoded file to persist Raft
+ log entries. Consumer group offsets are also persisted inside the Raft log as
+ CommitOffsetCommand entries, giving them the same durability guarantee as messages.`,
+ },
+ ].map(({ name, color, desc }) => (
+
+ ))}
+
+
+
+
+ {[
+ { step: '1', actor: 'Client', bg: 'bg-slate-800', border: 'border-slate-700', desc: 'Sends a ProduceRequest (topic + payload bytes) over TCP to any broker node.' },
+ { step: '2', actor: 'Broker — not leader', bg: 'bg-zinc-900', border: 'border-zinc-700', desc: 'If the receiving broker is a follower, it immediately responds NOT_LEADER:
. The client SDK transparently redirects to the leader.' },
+ { step: '3', actor: 'Leader — RaftNode', bg: 'bg-cyan-950', border: 'border-cyan-800', desc: 'Appends the entry to its local RaftLog, then fires parallel AppendEntries RPCs to all followers. The request is held open.' },
+ { step: '4', actor: 'Followers — RaftNode', bg: 'bg-zinc-900', border: 'border-zinc-700', desc: 'Each follower writes the entry to its local log and replies AppendEntriesResponse(success=true).' },
+ { step: '5', actor: 'Leader — Quorum reached', bg: 'bg-cyan-950', border: 'border-cyan-800', desc: 'Once a majority of nodes have acknowledged, the Leader advances its commitIndex, applies the entry to the MessageStore, and assigns a monotonically increasing global offset.' },
+ { step: '6', actor: 'Client', bg: 'bg-emerald-950', border: 'border-emerald-800', desc: 'Receives ProduceResponse(success=true, offset=N). The message is now durable and visible to consumers.' },
+ ].map(({ step, actor, bg, border, desc }) => (
+
+ ))}
+
+
+ {/* ── Section 3: Getting Started ──────────────────────────────── */}
+
+
+
+
+ {[
+ ['Java', '17+', 'The broker and client are written in Java 17.'],
+ ['Maven', '3.8+', 'Used to build all modules.'],
+ ['Node.js', '18+', 'Required only for the dashboard (optional).'],
+ ].map(([dep, ver, note]) => (
+
+ {dep}
+ {ver}
+ {note}
+
+ ))}
+
+
+
+ Clone the repository and build all Maven modules from the project root:
+
+{`git clone https://github.com/your-org/drmq.git
+cd drmq
+
+# Build all modules (broker + client + protocol)
+mvn clean install -DskipTests`}
+
+
+
+
+ The fastest way to get running. The broker starts on port 9092 and stores data in{' '}
+ ./data by default.
+
+
+{`cd drmq-broker
+
+# Default: node-id=1, port=9092, data-dir=./data, no peers (standalone)
+mvn exec:java`}
+
+
+
+
+ Open three separate terminals. Each node must know the addresses of its peers.
+ The cluster will elect a leader once a quorum (2 of 3) establishes connectivity.
+
+
+{`# Terminal 1 — Node 1
+cd drmq-broker
+mvn exec:java -Dexec.args="1 9092 ./data/node1 localhost:9093,localhost:9094"
+
+# Terminal 2 — Node 2
+cd drmq-broker
+mvn exec:java -Dexec.args="2 9093 ./data/node2 localhost:9092,localhost:9094"
+
+# Terminal 3 — Node 3
+cd drmq-broker
+mvn exec:java -Dexec.args="3 9094 ./data/node3 localhost:9092,localhost:9093"`}
+
+
+
+ Peers must be specified as a comma-separated list of host:port pairs
+ excluding the current node's own address. Providing the node's own address in the peer
+ list is harmless but generates a connection warning on startup.
+
+
+
+
+ Watch the logs. Within 2-3 seconds you should see one node win the election and print:
+
+
+{`[raft-timer] INFO RaftNode - [1] Became LEADER for term 1
+[raft-timer] INFO RaftNode - [1] Sending heartbeats to 2 peers`}
+
+
+
+
+{`cd drmq-dashboard
+npm install
+
+# Connect to all three broker nodes
+VITE_USE_WEBSOCKET=true npm run dev`}
+
+
+ Open http://localhost:5173 in your browser.
+ The dashboard connects to all broker nodes simultaneously via WebSocket and merges their
+ telemetry into a single unified view.
+
+
+ {/* ── Section 4: Configuration ────────────────────────────────── */}
+
+
+ The broker is configured entirely through command-line arguments. There is no configuration
+ file; all settings are passed as explicit flags at startup.
+
+
+
+
+
+
+
+ | Argument |
+ Type |
+ Default |
+ Description |
+
+
+
+ {[
+ ['node-id', 'String', 'Required', 'Unique identifier for this broker within the Raft cluster. Must be stable across restarts — changing it will cause the node to be treated as a new, unknown peer.'],
+ ['port', 'Integer', '9092', 'TCP port on which the Netty server listens for both client connections and inbound Raft RPC traffic from peers.'],
+ ['data-dir', 'String', './data', 'Root directory for all persistent state. DRMQ creates subdirectories here for raft/ (log + metadata) and store/ (topic segments + indexes). Point this at an NVMe-backed path in production.'],
+ ['peers', 'String', 'none', 'Comma-separated list of peer addresses in host:port format, e.g. localhost:9093,localhost:9094. Omit the node\'s own address. Empty means standalone (single-node) mode with no replication.'],
+ ].map(([arg, type, def, desc]) => (
+
+ | {arg} |
+ {type} |
+ {def} |
+ {desc} |
+
+ ))}
+
+
+
+
+
+
+ The following values are compiled into the broker. They are not runtime-configurable in
+ the current version but are documented here for operational awareness.
+
+
+
+
+
+ | Constant |
+ Value |
+ Description |
+
+
+
+ {[
+ ['MAX_FRAME_SIZE', '256 MB', 'Maximum Netty frame size. Governs the largest single RPC payload — relevant during snapshot transfer.'],
+ ['MAX_ENTRIES_PER_RPC', '500', 'Maximum Raft log entries sent in a single AppendEntries call. Prevents OOM during follower catch-up after a long partition.'],
+ ['MAX_SEGMENT_SIZE', '100 MB', 'Size at which an active .log segment is sealed and a new one is rolled. Older segments become candidates for compaction.'],
+ ['ELECTION_TIMEOUT', '150–300 ms', 'Randomised election timeout range (per node). A follower that receives no heartbeat within this window initiates a Pre-Vote round. The randomisation (150ms floor, 300ms ceiling) prevents split-votes by ensuring nodes rarely time out simultaneously.'],
+ ['HEARTBEAT_INTERVAL', '75 ms', 'How often the Leader sends AppendEntries heartbeats to reset follower timers. Must be well below the election timeout minimum (150ms) to prevent spurious elections under normal operation.'],
+ ['RECONNECT_DELAY_MS', '500 ms', 'Client SDK: pause between broker failover attempts to allow leader election to stabilise.'],
+ ['MAX_RETRIES', '5', 'Client SDK: maximum retries per operation across the full set of bootstrap servers before throwing IOException.'],
+ ['TELEMETRY_WS_PORT', '9093', 'WebSocket port on each broker node that streams telemetry JSON frames to the dashboard.'],
+ ].map(([name, val, desc]) => (
+
+ | {name} |
+ {val} |
+ {desc} |
+
+ ))}
+
+
+
+
+
+
+{`# Node 1 — 10.0.1.10
+java -server \\
+ -Xms4g -Xmx4g \\
+ -XX:+UseG1GC \\
+ -XX:MaxGCPauseMillis=20 \\
+ -jar drmq-broker.jar \\
+ 1 9092 /mnt/nvme/drmq/node1 10.0.1.11:9092,10.0.1.12:9092
+
+# Node 2 — 10.0.1.11
+java -server \\
+ -Xms4g -Xmx4g \\
+ -XX:+UseG1GC \\
+ -XX:MaxGCPauseMillis=20 \\
+ -jar drmq-broker.jar \\
+ 2 9092 /mnt/nvme/drmq/node2 10.0.1.10:9092,10.0.1.12:9092
+
+# Node 3 — 10.0.1.12
+java -server \\
+ -Xms4g -Xmx4g \\
+ -XX:+UseG1GC \\
+ -XX:MaxGCPauseMillis=20 \\
+ -jar drmq-broker.jar \\
+ 3 9092 /mnt/nvme/drmq/node3 10.0.1.10:9092,10.0.1.11:9092`}
+
+
+ {/* ── Section 5: Raft Consensus ───────────────────────────────── */}
+
+
+ DRMQ implements the Raft consensus algorithm as described in the original{' '}
+ In Search of an Understandable Consensus Algorithm paper (Ongaro & Ousterhout, 2014),
+ extended with the Pre-Vote mechanism from the follow-up dissertation. The implementation
+ lives entirely in RaftNode.java.
+
+
+
+
+ {[
+ { state: 'FOLLOWER', color: 'zinc', desc: 'Default state on startup. Passively replicates entries from the leader. Runs an election timer; if no heartbeat is received before timeout, transitions to Pre-Candidate.' },
+ { state: 'CANDIDATE', color: 'amber', desc: 'Actively seeking votes. Sends RequestVote RPCs to all peers. Transitions to LEADER on quorum, or back to FOLLOWER if a higher term is observed.' },
+ { state: 'LEADER', color: 'cyan', desc: 'Accepts all write requests. Broadcasts AppendEntries RPCs on every heartbeat interval and immediately after a new log entry is appended. There is exactly one leader per term.' },
+ ].map(({ state, color, desc }) => (
+
+ ))}
+
+
+
+
+ A standard Raft follower increments its term and requests votes the moment its election
+ timer fires. This is safe but can cause unnecessary term inflation when a partitioned node
+ reconnects — it may have a term far ahead of the cluster, forcing a brief but disruptive
+ leader re-election even though it has stale log data.
+
+
+ DRMQ uses the Pre-Vote extension to prevent this.
+ When a follower's timer expires it sends a{' '}
+ PreVoteRequest without incrementing its
+ term. A peer grants a pre-vote only if the requester's log is at least as up-to-date
+ as the peer's own log. Only after receiving pre-votes from a majority does the node
+ increment its term and send real{' '}
+ RequestVote RPCs. A lagging node that
+ has missed thousands of entries will be denied pre-votes and remain a follower, silently
+ catching up without disrupting the cluster.
+
+
+
+
+ Every ProduceRequest is converted into a
+ Raft log entry and appended to the leader's local RaftLog.
+ The leader then replicates it via AppendEntries RPCs.
+ Key implementation details:
+
+
+ {[
+ ['Batching during catch-up', 'When a follower is lagging, the leader batches up to MAX_ENTRIES_PER_RPC (500) entries per RPC to bound per-message memory overhead and prevent frame-size exceptions on the Netty transport.'],
+ ['prevLogIndex / prevLogTerm check', 'Every AppendEntries carries the index and term of the entry immediately before the batch. The follower rejects the RPC if its local log does not match — this enforces the Log Matching Property.'],
+ ['Monotonic commit advancement', 'The leader advances commitIndex only after a majority of matchIndex values meet or exceed the new entry\'s index. commitIndex never decreases.'],
+ ['Apply loop', 'A dedicated thread watches commitIndex. Whenever commitIndex > lastApplied, entries are applied to the MessageStore in strict order and lastApplied is incremented. This is the point at which messages become readable by consumers.'],
+ ].map(([title, desc]) => (
+
+ {title}
+ {desc}
+
+ ))}
+
+
+
+
+ The Raft log grows continuously as messages are appended. To prevent unbounded disk usage,
+ DRMQ compacts the RaftLog by truncating
+ entries that have already been applied to the MessageStore.
+ This is done by rewriting the raft.log file to remove
+ older entries and updating the startIndex.
+
+
+ State Transfer: When a follower falls behind and its required entries have been compacted from the leader's RaftLog, the leader initiates an InstallSnapshotRequest stream. The leader dynamically zips its persistent MessageStore and OffsetManager data into a single archive, transmitting it in 2MB chunks over the network. Upon receiving the final chunk, the follower safely and atomically hot-swaps its current storage directories with the snapshot contents, completely recovering its topic and offset states without needing a JVM restart.
+
+
+
+
+ The following state is written to disk before any RPC response is sent, ensuring correctness
+ after a crash:
+
+
+
+
+
+ | Field |
+ Where stored |
+ Why it must survive a crash |
+
+
+
+ {[
+ ['currentTerm', 'raft/state.properties', 'Prevents a restarted node from accepting RPCs from a leader in an older term.'],
+ ['votedFor', 'raft/state.properties', 'Ensures a node never grants two votes in the same term (Safety Property).'],
+ ['log entries', 'raft/raft.log', 'The source of truth for uncommitted writes. Entries committed but not yet snapshotted must survive to be re-applied.'],
+ ['lastApplied', 'raft/state.properties', 'Restored on restart and used to reconstruct commitIndex via Math.min(lastLogIndex, max(commitIndex, lastApplied)). Also used to set RaftLog.startIndex if the live log is empty after a snapshot.'],
+ ['commitIndex', '(derived)', 'Not directly persisted. On startup it is reconstructed from lastApplied and the last index in the live Raft log, preventing re-application of already-stored messages.'],
+ ].map(([field, loc, why]) => (
+
+ | {field} |
+ {loc} |
+ {why} |
+
+ ))}
+
+
+
+
+ {/* ── Section 6: Producer API ─────────────────────────────────── */}
+
+
+ The DRMQProducer is a thread-safe client for sending messages to the cluster.
+ It uses synchronous sends with an underlying blocking TCP socket, meaning send()
+ blocks until the message is either acknowledged by a Raft quorum or an unrecoverable error occurs.
+
+
+
+
+ Producers should be initialised with a list of bootstrap servers. The producer connects to a random
+ server from the list. If it connects to a Follower, the Follower immediately responds with a
+ NOT_LEADER error containing the current Leader's address.
+ The producer transparently redirects to the Leader. If the Leader crashes, the producer will
+ cycle through the bootstrap servers, waiting for a new Leader to be elected.
+
+
+{`import com.drmq.client.DRMQProducer;
+
+// Initialize with a comma-separated list of bootstrap servers
+DRMQProducer producer = new DRMQProducer("10.0.1.10:9092,10.0.1.11:9092,10.0.1.12:9092");
+
+// Connect to the cluster
+producer.connect();`}
+
+
+
+
+ Messages can be sent as raw byte arrays or as UTF-8 strings. You can optionally attach a key
+ to the message (useful for downstream grouping/partitioning logic, though DRMQ enforces global
+ ordering regardless of the key).
+
+
+{`// Send a simple string payload
+DRMQProducer.SendResult result = producer.send("orders", "Order #1234");
+
+if (result.isSuccess()) {
+ System.out.println("Message committed at offset: " + result.getOffset());
+} else {
+ System.err.println("Failed to send: " + result.getErrorMessage());
+}
+
+// Send raw bytes with an optional key
+byte[] payload = serialize(myObject);
+DRMQProducer.SendResult result = producer.send("metrics", payload, "sensor-01");`}
+
+
+ The send() method automatically retries up to{' '}
+ MAX_RETRIES = 5 times across the bootstrap servers
+ if connection errors or leader elections occur during the send. It only throws an{' '}
+ IOException if all retries are exhausted.
+
+
+ {/* ── Section 7: Consumer API ─────────────────────────────────── */}
+
+
+ The DRMQConsumer is used to read messages from topics.
+ DRMQ supports two distinct consumption modes: Group Mode (default)
+ and Single Mode.
+
+
+
+
+ {[
+ { mode: 'Group Mode (Default)', desc: 'The broker tracks the consumer\'s offsets persistently via Raft. Multiple consumers in the same group coordinate via short-lived leases, ensuring a message is delivered to exactly one consumer in the group.' },
+ { mode: 'Single Mode', desc: 'The consumer tracks its own offsets locally. It sends raw offset-based requests to the broker. Useful for replay tools, ad-hoc scripts, or systems that persist offsets in their own external database.' },
+ ].map(({ mode, desc }) => (
+
+ ))}
+
+
+
+
+ In group mode, the consumer asks the broker where it left off upon subscription. Auto-commit
+ is disabled by default, meaning you must manually commit offsets after processing, or explicitly
+ enable auto-commit.
+
+
+{`import com.drmq.client.DRMQConsumer;
+import com.drmq.client.DRMQConsumer.ConsumedMessage;
+
+// Initialize with bootstrap servers and consumer group ID
+DRMQConsumer consumer = new DRMQConsumer("10.0.1.10:9092,10.0.1.11:9092,10.0.1.12:9092", "analytics-group");
+consumer.connect();
+
+// Let the broker manage the offset
+consumer.subscribe("orders");
+
+// Optional: Enable auto-commit after every poll
+consumer.setAutoCommit(true);
+
+while (true) {
+ // Poll max 100 messages. Wait up to 2000ms if queue is empty (Long Polling)
+ List messages = consumer.poll(100, 2000);
+
+ for (ConsumedMessage msg : messages) {
+ System.out.printf("Offset: %d, Key: %s, Data: %s%n",
+ msg.offset(), msg.key(), msg.payloadAsString());
+ }
+}`}
+
+
+
+
+ To achieve at-least-once processing semantics, you must disable auto-commit, process the
+ messages fully, and only then commit the offset. You can also explicitly seek to a specific offset.
+
+
+{`consumer.setAutoCommit(false);
+
+// Override the broker's offset and explicitly resume from offset 500
+consumer.subscribe("orders", 500L);
+
+List messages = consumer.poll(50, 1000);
+for (ConsumedMessage msg : messages) {
+ processInDatabase(msg);
+}
+
+// Manually commit the offset to the broker
+if (!messages.isEmpty()) {
+ long lastOffset = messages.get(messages.size() - 1).offset();
+ consumer.commit("orders", lastOffset);
+}`}
+
+
+
+
+ If you want a consumer to simply read from a topic independently without the broker tracking
+ leases or preventing other consumers from reading the same messages, disable group mode.
+
+
+{`DRMQConsumer singleConsumer = new DRMQConsumer("10.0.1.10:9092,10.0.1.11:9092");
+
+// Disable group coordination
+singleConsumer.setGroupMode(false);
+singleConsumer.connect();
+
+// Provide an explicit starting offset, as the broker won't track it for you
+singleConsumer.subscribe("system-logs", 0L);
+
+while (true) {
+ List logs = singleConsumer.poll(500);
+ for (ConsumedMessage log : logs) {
+ System.out.println(log.payloadAsString());
+ }
+}`}
+
+
+
+
+ The poll(maxMessages, timeoutMs) method
+ determines the polling behaviour based on the timeoutMs:
+
+
+ -
+ timeoutMs = 0 (Short Poll): The broker checks the log and returns immediately,
+ even if there are no new messages. Useful for non-blocking UI threads or background checks.
+
+ -
+ timeoutMs > 0 (Long Poll): The broker holds the TCP request open for up to
+
timeoutMs. If a new message is appended to the
+ Raft log by a Producer, the broker instantly wakes up the connection and pushes the message.
+ This reduces CPU overhead drastically compared to busy-waiting short polls.
+
+
+
+ {/* ── Section 8: Python Client (SDK) ──────────────────────────── */}
+
+
+ Because DRMQ relies entirely on raw TCP framing and Google Protocol Buffers,
+ building clients in other languages is incredibly easy. A native Python SDK is
+ available in the drmq-python-client directory.
+ The SDK natively handles automatic leader failovers, transparent retries, and offset auto-committing.
+
+
+
+{`from drmq_client import DRMQProducer
+
+producer = DRMQProducer("localhost:9092,localhost:9093")
+producer.connect()
+
+# The payload must be bytes
+res = producer.send("python-topic", b"Hello from Python!")
+if res.success:
+ print(f"Message sent successfully at offset {res.offset}")
+else:
+ print(f"Failed: {res.error_message}")`}
+
+
+
+
+{`from drmq_client import DRMQConsumer
+
+consumer = DRMQConsumer("localhost:9092,localhost:9093", group_id="python-workers")
+consumer.auto_commit = True
+consumer.connect()
+consumer.subscribe("python-topic")
+
+# Long-poll the broker for new messages
+messages = consumer.poll(max_messages=10, timeout_ms=5000)
+for msg in messages:
+ print(f"Received (offset {msg.offset}): {msg.payload.decode('utf-8')}")`}
+
+
+ {/* ── Section 9: TypeScript Client (SDK) ──────────────────────── */}
+
+
+ A native Node.js/TypeScript SDK is also available in the
+ drmq-ts-client directory. It uses
+ the net module to interact natively
+ with the broker without requiring heavy HTTP libraries, and features exactly the
+ same automatic leader redirection and failover capabilities as the Java client.
+
+
+
+{`import { DRMQProducer } from './client';
+
+const producer = new DRMQProducer("localhost:9092,localhost:9093");
+await producer.connect();
+
+const payload = Buffer.from("Hello from TypeScript!");
+const res = await producer.send("ts-topic", payload);
+console.log(\`Sent at offset \${res.offset}\`);`}
+
+
+
+
+{`import { DRMQConsumer } from './client';
+
+const consumer = new DRMQConsumer("localhost:9092,localhost:9093", "ts-workers");
+consumer.autoCommit = true;
+await consumer.connect();
+await consumer.subscribe("ts-topic");
+
+// Long-poll the broker for new messages
+const messages = await consumer.poll(10, 5000);
+for (const msg of messages) {
+ console.log(\`Received: \${Buffer.from(msg.payload).toString('utf-8')}\`);
+}`}
+
+ {/* ── Section 10: Storage Engine ───────────────────────────────── */}
+
+
+ Once a message is committed by the Raft consensus layer, it is handed off to the{' '}
+ MessageStore. The storage layer uses a
+ segmented, append-only log design optimized for high-throughput sequential writes and
+ efficient sequential reads.
+
+
+
+
+ Messages for a topic are stored in a dedicated directory (e.g., ./data/store/topics/orders/).
+ Instead of writing to a single infinitely growing file, the log is split into segments of
+ up to 100 MB.
+
+ Each segment is stored as a single data file:
+
+ -
+ Data File (
00000000000000000000.log): Contains the raw serialized
+ message payloads along with a binary header (length prefix).
+
+
+
+ To achieve O(1) random access for consumers, DRMQ builds a
+ sparse index in memory (ConcurrentSkipListMap)
+ during broker startup. There are no on-disk .idx files.
+ When a consumer requests a specific offset, the broker binary-searches the in-memory index to find the
+ nearest byte boundary before performing a short linear scan on disk.
+
+
+ {/* ── Section 11: Consumer Groups ─────────────────────────────── */}
+
+
+ DRMQ uses a server-coordinated consumer group model to provide exact load-balancing without the
+ need for complex client-side partition rebalancing protocols (like ZooKeeper or Kafka's GroupCoordinator).
+
+
+
+
+ Because DRMQ has no partitions (every topic is a single linear log), multiple consumers in a group
+ cannot simply lock different partitions. Instead, DRMQ uses a message-level lease system:
+
+
+ {[
+ { step: '1', title: 'Poll Request', desc: 'A consumer in the group requests a batch of messages.' },
+ { step: '2', title: 'Lease Grant', desc: 'The broker identifies the next uncommitted, unleased offset. It grants a 30-second lease to the consumer for a batch of messages.' },
+ { step: '3', title: 'Processing', desc: 'While the lease is active, no other consumer in the group will be handed those specific offsets.' },
+ { step: '4', title: 'Commit or Expire', desc: 'If the consumer commits the offsets within the timeout, the broker marks them permanently processed. If the lease expires or the client disconnects, the broker invalidates the lease and makes the messages available to the next polling consumer.' },
+ ].map(({ step, title, desc }) => (
+
+ ))}
+
+
+
+
+ Consumer group offsets must be as resilient as the messages themselves. When a consumer commits
+ an offset, the broker generates an internal CommitOffsetCommand.
+ This command is appended to the Raft log and replicated across the quorum just
+ like a standard producer message. If the Leader crashes, the new Leader replays the Raft log
+ to reconstruct the exact state of all consumer groups.
+
+
+ {/* ── Section 12: Telemetry & Dashboard ───────────────────────── */}
+
+
+ Every DRMQ broker runs an embedded WebSocket server (port 9093 by default)
+ that streams real-time JSON telemetry frames.
+
+
+ {[
+ { title: 'Cluster Metrics', desc: 'Current term, leader identity, commit index, and last applied index.' },
+ { title: 'Throughput', desc: 'Messages written/sec, messages read/sec, and active socket connections.' },
+ { title: 'Storage Health', desc: 'Active segments, disk usage bytes, and compaction occurrences.' },
+ { title: 'Raft Timers', desc: 'Election duration, heartbeat latency, and replication lag per follower.' },
+ ].map(({ title, desc }) => (
+
+ ))}
+
+
+ The React-based Dashboard connects directly to all nodes in the cluster simultaneously.
+ It aggregates the WebSocket streams on the client side, allowing you to instantly visualize
+ network partitions, leader failovers, and follower lag without relying on external metric scrapers like Prometheus.
+
+
+ {/* ── Section 13: Production Deployment ───────────────────────── */}
+
+
+ While DRMQ runs easily on a laptop for development, running a distributed consensus-based
+ system in production requires specific hardware and OS-level considerations to guarantee
+ durability and performance.
+
+
+
+
+ {[
+ { component: 'Storage (NVMe SSD)', desc: 'Crucial. DRMQ is an append-only system that requires fast fsyncs. Spinning HDDs or slow cloud EBS volumes will cause high Raft commit latency, slowing down producers.' },
+ { component: 'Memory (RAM)', desc: 'DRMQ relies heavily on the Linux Page Cache. Allocate 4-8GB to the JVM, but leave the majority of the system RAM to the OS for caching .log files.' },
+ { component: 'CPU', desc: 'At least 4 cores. Raft RPC handling, message decoding, and telemetry serialization run on separate thread pools to avoid blocking the consensus heartbeat.' },
+ { component: 'Network', desc: '1Gbps+ isolated subnet. Because Raft replicates the full stream to all nodes, write throughput is bounded by the network bandwidth between the Leader and Followers.' },
+ ].map(({ component, desc }) => (
+
+ ))}
+
+
+
+
+ -
+ File Descriptors: Increase
ulimit -n to at least 100,000.
+ The broker holds open file descriptors for every topic segment and every active client connection.
+
+ -
+ Swappiness: Set
vm.swappiness = 1. You do not want the
+ OS swapping the JVM heap to disk, as this will cause unpredictable GC pauses that trigger false Raft elections.
+
+ -
+ Garbage Collection: Use
-XX:+UseG1GC with a low pause
+ target (e.g., -XX:MaxGCPauseMillis=20) so GC pauses don't exceed the
+ 75ms Raft heartbeat interval.
+
+
+
+ {/* ── Section 14: Fault Tolerance ───────────────────────────── */}
+
+
+ DRMQ is built to survive failures gracefully. The following scenarios describe how the cluster
+ behaves under duress.
+
+
+ {[
+ ['Follower Crash', 'The cluster continues operating normally. The Leader logs warnings that the follower is unreachable. When the follower restarts, the Leader automatically sends it the missing entries.'],
+ ['Leader Crash', 'Write operations temporarily block. Within 150-300ms, the remaining nodes notice the lack of heartbeats. One initiates an election, wins a quorum, and becomes the new Leader. Clients transparently reconnect.'],
+ ['Minority Network Partition', 'If 1 node in a 3-node cluster loses network access, the other 2 nodes form a quorum and continue processing. The isolated node cannot elect itself (it cannot reach a quorum) and refuses writes. When the partition heals, the Pre-Vote mechanism prevents the isolated node from disrupting the active Leader.'],
+ ['Majority Network Partition', 'If 2 out of 3 nodes crash, the remaining node steps down to FOLLOWER. All produce requests will fail (or block, depending on client configuration) because a quorum cannot be reached to commit writes. This guarantees consistency over availability (CP system).'],
+ ].map(([scenario, result]) => (
+
+
{scenario}
+
{result}
+
+ ))}
+
+
+ {/* ── Section 15: CLI Reference ─────────────────────────────── */}
+
+
+ The drmq-client module includes interactive REPL
+ (Read-Eval-Print Loop) applications for testing clusters without writing Java code.
+
+
+
+ Starts an interactive prompt to send messages to any topic.
+
+{`# Usage: mvn exec:java -Dexec.mainClass="...ProducerApp" -Dexec.args="[bootstrap_servers]"
+cd drmq-client
+mvn exec:java -Dexec.mainClass="com.drmq.client.commandLineExample.ProducerApp" \\
+ -Dexec.args="10.0.1.10:9092,10.0.1.11:9092,10.0.1.12:9092"`}
+
+ Inside the REPL:
+
+{`producer> send orders Book Order #101
+✓ Message sent: topic=orders, offset=42
+
+producer> send alerts System is ONLINE
+✓ Message sent: topic=alerts, offset=43`}
+
+
+
+ Starts an interactive prompt to subscribe, poll, and manage auto-commit.
+
+{`# Usage: mvn exec:java -Dexec.mainClass="...ConsumerApp" -Dexec.args="[bootstrap_servers] [group_id]"
+cd drmq-client
+mvn exec:java -Dexec.mainClass="com.drmq.client.commandLineExample.ConsumerApp" \\
+ -Dexec.args="10.0.1.10:9092,10.0.1.11:9092,10.0.1.12:9092 analytics-group"`}
+
+ Inside the REPL:
+
+{`consumer[analytics-group]> subscribe orders
+✓ Subscribed to [orders] (resuming from broker offset 42)
+
+consumer[analytics-group]> poll
+[offset=42, key=null, time=14:30:22] Book Order #101
+
+consumer[analytics-group]> commit orders 42
+✓ Committed offset 42 for topic 'orders'`}
+
+
+