Skip to content

internal/cluster/sharding is unwired dead code — decide: finish, remove, or document #448

@xe-nvdk

Description

@xe-nvdk

Status

Discovered while securing cluster WAL replication for GHSA-wfgr-8x84-22q7 (X1 audit). The internal/cluster/sharding package compiles and has tests, but none of its constructors are called from production code:

  • NewShardReplicationManager — only called from internal/cluster/sharding/shard_replication_test.go
  • NewShardReceiverManager — only called from internal/cluster/sharding/shard_replication_test.go
  • NewShardRouter — only called from internal/cluster/sharding/router_test.go (route helpers RouteShardedWrite / RouteShardedQuery exist in internal/api/routing.go but are not invoked from any non-test caller either)

The coordinator dispatch in internal/cluster/coordinator.go has no server-side listener for the sharding MsgReplicateSync flow. Both shard_replication.go (primary side) and shard_receiver.go (replica side) define connect() functions that dial out, leaving no side that accepts the inbound handshake. The flow is asymmetric and incomplete.

What Arc Enterprise actually uses today

The single-writer + multi-reader Raft + replication model from internal/cluster/coordinator.go + internal/cluster/replication/ (the path secured by GHSA-wfgr-8x84-22q7). Customer deployments run this model. Sharding is not a deployed feature.

Why this matters

  1. Confused-reviewer hazard. Security audits + Gemini reviews keep flagging internal/cluster/sharding/* as relevant attack surface. It isn't — but every reviewer has to re-derive that fact from scratch. The X1 fix had to explicitly carve it out of scope in the release notes.
  2. Maintenance surface without a customer. ~1.2k lines of replication code that compiles and tests but ships dead in every binary. Every refactor in internal/cluster/replication/protocol.go or internal/cluster/security/auth.go has to either also touch the sharding mirrors or knowingly skip them.
  3. Half-finished design. When sharding IS picked up, the implementer should redesign the auth model from scratch — the current handshake direction is ambiguous (both ends dial). Retrofitting the X1 HMAC onto today's shape would be wasted work.

Options

  • (a) Finish wiring. Add the coordinator-side dispatch for MsgReplicateSync to the shard receiver, plumb the HMAC primitives, and ship sharding as an Enterprise feature. Significant work — neither the routing nor the failover paths are wired either. Not on 26.05.1 or 26.06.1 roadmap.
  • (b) Remove the package. Delete internal/cluster/sharding/. Anyone who needs sharding later starts from a clean design. Small PR, removes maintenance liability, removes confused-reviewer noise.
  • (c) Document as intentional scaffolding. Add a doc.go to the package explaining that it's scaffolding for a future feature, no production callers expected, security primitives must be redesigned at activation time. Keeps the code, mutes the reviewer noise.

Recommendation

(b) remove unless we have a concrete near-term plan to ship sharding. (c) documents the status but leaves the maintenance debt; (a) is unbudgeted. Removing is reversible — anyone who picks it up later has the git history and the linked memory note as design context.

References

  • X1 / CVE-2026-48106 release notes (RELEASE_NOTES_2026.06.1.md section "Cluster replication stream now HMAC-authenticated end-to-end") explicitly carves sharding out of scope.
  • Audit finding lives at GHSA-wfgr-8x84-22q7

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions