Single process horizontal scaling limitations #79
Replies: 5 comments 2 replies
-
|
Just to be clear, our 300k messages/sec is only under heavy load and not a permanent thing. Users send marketing broadcast to thousands of users and each broadcast needs multiple broker messages to get contact information, enrichment, ask the AI to create a message fit for the user, send it, save it to our DB, receive back the webhook mark it as received, etc... Horizontal scaling allows us to easily spin up new broker processes under heavy load and then scale down when those heavy broadcasts are finished. |
Beta Was this translation helpful? Give feedback.
-
|
Hi @gerardatkonvo — thanks for the detailed framing. 300k msg/sec on BullMQ Pro is a serious workload, and the question deserves a concrete technical answer rather than a hand-wave. Let me address both points directly. On the architectural constraint and the LiteFS / rqlite / libSQL angleThe single-process broker is not a permanent constraint, but I want to be upfront that none of LiteFS, rqlite, or Turso/libSQL would actually solve the scaling problem you're describing. They address durability and failover, not horizontal write throughput, and the reason matters for the broader roadmap discussion. bunqueue's hot path is dominated by in-memory data structures (sharded priority queues, jobIndex map, lock table, scheduler heap) running inside a single Bun event loop. SQLite is a downstream sink: writes are batched every 10ms by the WriteBuffer and flushed in bulk. The benchmark numbers (~287k ops/sec embedded, ~149k TCP for bulk push) are CPU-bound on the broker process, not I/O-bound on SQLite. Replacing SQLite with a replicated backend would not move the ceiling — it would only add latency. A second invariant matters here: in bunqueue, So those backends are a viable option for HA (zero-downtime upgrades, primary failover with low RPO) but not for horizontal scaling. The actual roadmap for distributed brokersTrue multi-broker support is on the roadmap, and it is planned as a bunqueue Pro offering. The architecture I'm targeting is queue-name sharding via consistent hashing across a cluster of independent broker processes, similar in spirit to Redis Cluster or partitioned BullMQ deployments. Each broker remains autonomous, with its own local SQLite, no consensus protocol on the hot path, and no coordination overhead per operation. A client-side router (or a thin proxy layer) maps queue names to brokers via consistent hashing with virtual nodes, and the worker layer connects directly to the broker that owns the queue it's consuming. This preserves the throughput characteristics of a single broker per node while letting aggregate capacity scale linearly with cluster size. Four brokers reach ~600k ops/sec, eight reach ~1.2M, and so on. HA is layered on top via per-shard primary/replica pairs (LiteFS-style replication is appropriate here, since within a shard you're back to the single-writer model where it works well). The Pro tier would also include the operational tooling that becomes mandatory in a clustered setup: rebalancing, drain protocols, cross-broker stats aggregation, and a coordinator for cron schedulers to prevent duplicate fires. The honest trade-off you should know about: queue-name sharding does not split a single hot queue across nodes. If your 300k msg/sec lives on one or two logical queues, partitioning won't help — you would need to introduce sub-queue keying at the producer ( I cannot commit to a public timeline for the Pro release yet, but I can confirm it is the direction and that I'd be happy to keep you updated as the design firms up. On the practical ceiling of the single brokerConcrete numbers from the published benchmarks, so you can do the math against your workload:
These are sustained at 50k jobs in the harness; the broker scales smoothly up to that range without degradation. The realistic ceiling for production traffic is the end-to-end column: ~75k jobs/sec embedded, ~33k jobs/sec TCP. With Other limits to be aware of:
For your specific caseBeing direct: at 300k msg/sec, single-broker bunqueue is not currently a fit, and I would not want you to migrate into a wall. The realistic options I see for Konvo are:
Happy to go deeper on any of these or to schedule a short call if it helps. And if Konvo's workload profile would shape the Pro design decisions in a useful direction, I'd genuinely value the input. — Egeo |
Beta Was this translation helpful? Give feedback.
-
|
Thanks @gerardatkonvo — that's a perfectly reasonable bar, and honestly the right call for the workload profile you described. I'd give Konvo the same advice in your position. The kind words on architecture and API design genuinely mean a lot, especially coming from a team running at that scale. I'll absolutely ping you here (or on LinkedIn — happy to connect) the moment the Pro cluster work is in a state worth re-evaluating. And if at any point Konvo would be open to sharing more about your queue topology and job-mix profile, that kind of real-world input from a workload at your size would meaningfully shape the design priorities — no commitment expected, just an open door if it's useful on your end. Wishing you a smooth path forward with the broadcasts, and thanks again for taking the time to make the case so clearly. — Egeo |
Beta Was this translation helpful? Give feedback.
-
rqlite creator here. That is correct. rqlite is not about horizontal write scaling. Good to see correct descriptions of what it does, many folks have an unclear idea of what Raft based systems actually do. |
Beta Was this translation helpful? Give feedback.
-
|
Yes, I would be happy to connect. If you'd like to talk 1x1 maybe book an rqlite Office Hours? https://calendly.com/philipomailbox-calendly/rqlite-office-hours |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi. Cool project! We're currently using BullMQ Pro at Konvo and exploring alternatives with a clear migration path.
I noticed on the comparison page that horizontal scaling is listed as single-process. I understand bunqueue supports multiple worker clients over TCP, so the worker layer scales — but the broker itself is a single process bound to one SQLite instance.
Our current BullMQ deployment peaks at around 300k messages/sec, which is roughly 10x the benchmarked throughput of a single bunqueue broker. So for us, the distributed scaling question isn't theoretical — it's a hard requirement.
Two questions:
Would love to understand the long-term scaling story before committing to a migration.
Beta Was this translation helpful? Give feedback.
All reactions