Summary
microceph cluster join fails with Error: Failed to join cluster: Ready dqlite: context deadline exceeded on clusters whose microcluster dqlite database has grown large. The joining node cannot download + replay the dqlite state and reach "ready" within the framework's hardcoded ready timeout, so the join times out and rolls back.
Request: expose a way to increase the join/ready timeout (CLI flag, env var, or config) for clusters with a large dqlite DB and/or slow disks — and, ideally, also let operators bound the DB size via dqlite snapshot params (see "Related" below).
Environment
- MicroCeph 19.2.3 (snap rev 1701,
squid/stable), Ceph 19.2.3 Squid
- Ubuntu 24.04, kernel 6.8
- Existing 3-node cluster (mon+osd), adding a 4th storage node
- 1 GbE management network; OSDs on SATA SSDs
What happens
On the joining node:
$ sudo microceph cluster join <token> --microceph-ip <addr>
Error: Failed to join cluster: Ready dqlite: context deadline exceeded
Joining-node daemon log (snap.microceph.daemon): PreInit → ~31s of silence → PreRemove (force=true) (rollback).
Leader-side daemon log during the attempt:
level=error msg="Received error sending heartbeat to cluster member"
error="Database is still starting" target="<joiner>:7443"
level=warning msg="Failed to get status of cluster member ... /core/1.0/ready ... connect: connection refused"
The joiner's dqlite never finishes starting within the window.
Root cause
The microcluster dqlite DB is ~126 MB on every member (raft log segments):
$ sudo du -sh /var/snap/microceph/common/state/database/
126M .../database/
# ~25 raft segment files of 4–8 MB each (open-* and <index>-<index>)
On join, the new member must receive + apply this state and reach "ready" within a hardcoded deadline:
microcluster/internal/db/db.go wraps the ready-wait in a fixed context.WithTimeout(...) (currently 120*time.Second on main; the version vendored in MicroCeph 19.2.3 behaves as ~30s in our testing — PreInit→rollback in ~31s). There is no flag/env/config to change it.
The DB is large because microcluster never sets dqlite snapshot params:
microcluster/internal/db/dqlite.go — both dqlite.New(...) calls omit WithSnapshotParams, so go-dqlite defaults apply (threshold=1024, trailing=8192). Canonical's own k8s-dqlite docs call these defaults "too large for small clusters". MicroCeph writes large config/OSD entries to raft, so 8192 trailing
entries ≈ 126 MB.
Why there's no workaround today
- The join/ready timeout is a compiled-in constant — not exposed on
microceph cluster join (--help has only --microceph-ip, --debug, --verbose, --state-dir), not in ceph config, not env-driven.
- The dqlite trailing window is not exposed either (no
WithSnapshotParams, no tuning.yaml).
- Result: a cluster with a legitimately large dqlite DB can never add a node, with no operator-facing remedy.
Request
-
Make the join/ready timeout configurable — e.g. microceph cluster join --timeout, a daemon config key, or an env var — so large-DB / slow-disk joins can complete.
-
(Complementary) expose dqlite snapshot params (threshold/trailing) so operators can bound the DB size, the same way k8s-dqlite already does via tuning.yaml:
snapshot:
trailing: 1024
threshold: 512
(Note k8s-dqlite's caveat: set both — setting trailing alone forces threshold=0, snapshotting every transaction.)
Related / precedent
Summary
microceph cluster joinfails withError: Failed to join cluster: Ready dqlite: context deadline exceededon clusters whose microcluster dqlite database has grown large. The joining node cannot download + replay the dqlite state and reach "ready" within the framework's hardcoded ready timeout, so the join times out and rolls back.Request: expose a way to increase the join/ready timeout (CLI flag, env var, or config) for clusters with a large dqlite DB and/or slow disks — and, ideally, also let operators bound the DB size via dqlite snapshot params (see "Related" below).
Environment
squid/stable), Ceph 19.2.3 SquidWhat happens
On the joining node:
Joining-node daemon log (
snap.microceph.daemon):PreInit→ ~31s of silence →PreRemove (force=true)(rollback).Leader-side daemon log during the attempt:
The joiner's dqlite never finishes starting within the window.
Root cause
The microcluster dqlite DB is ~126 MB on every member (raft log segments):
On join, the new member must receive + apply this state and reach "ready" within a hardcoded deadline:
microcluster/internal/db/db.gowraps the ready-wait in a fixedcontext.WithTimeout(...)(currently120*time.Secondonmain; the version vendored in MicroCeph 19.2.3 behaves as ~30s in our testing — PreInit→rollback in ~31s). There is no flag/env/config to change it.The DB is large because microcluster never sets dqlite snapshot params:
microcluster/internal/db/dqlite.go— bothdqlite.New(...)calls omitWithSnapshotParams, so go-dqlite defaults apply (threshold=1024,trailing=8192). Canonical's own k8s-dqlite docs call these defaults "too large for small clusters". MicroCeph writes large config/OSD entries to raft, so 8192 trailingentries ≈ 126 MB.
Why there's no workaround today
microceph cluster join(--helphas only--microceph-ip,--debug,--verbose,--state-dir), not inceph config, not env-driven.WithSnapshotParams, notuning.yaml).Request
Make the join/ready timeout configurable — e.g.
microceph cluster join --timeout, a daemon config key, or an env var — so large-DB / slow-disk joins can complete.(Complementary) expose dqlite snapshot params (threshold/trailing) so operators can bound the DB size, the same way k8s-dqlite already does via
tuning.yaml:(Note k8s-dqlite's caveat: set both — setting
trailingalone forcesthreshold=0, snapshotting every transaction.)Related / precedent
tuning.yaml(link above).Ready dqlite: context deadline exceededsymptom; PR fix: auto-detect joiner address from join token peers #710 fixed an address-selection variant but not the large-DB/timeout case.