Skip to content

Support RDMA handshake V3#53

Closed
chenBright wants to merge 1 commit into
masterfrom
rdma_handshake_v3
Closed

Support RDMA handshake V3#53
chenBright wants to merge 1 commit into
masterfrom
rdma_handshake_v3

Conversation

@chenBright

@chenBright chenBright commented Jun 1, 2026

Copy link
Copy Markdown
Owner

What problem does this PR solve?

Issue Number: resolve

Problem Summary:

The legacy v2 handshake ("RDMA" magic + 36B fixed binary HelloMessage)
had correctness bugs that made the wire format effectively
unevolvable.

  1. Client never drained the "unknown tail" bytes when a peer sent
    msg_len > HELLO_MSG_LEN_MIN(40). Leftover bytes stayed in the
    socket recv buffer and silently corrupted the next ReadFromFd
    (the ACK).
  2. Server had the symmetric version of A.
  3. Server computed the body read length from its LOCAL
    g_rdma_hello_msg_len, implicitly assuming the peer's hello is the
    same length as its own. A longer peer left bytes behind; a shorter
    peer made the read block. Client used the correct compile-time
    constant; the two sides were not symmetric.

Combined, 1/2/3 meant v2 could not safely append a single byte to the
hello -- even an "optional hint" appended by a newer sender would
mis-align an older receiver's next read.

What is changed and the side effects?

Changed:

v3 handshake is designed which is wire format, magic-namespace-isolated from v2:

[ "RDM3" 4B ][ pb_size 4B big-endian ][ RdmaHello protobuf bytes ]

with pb_size in (0, 4096]. RdmaHello carries the same 6 base fields
as v2 plus room to append future capabilities.

Why protobuf (and not "v2 plus length prefix")

  • Variable-length fields are coming. Future capability fields will
    include strings (rdma_device_name, netdev_name, ...) and other
    variable-length data. Supporting them on a fixed-binary protocol
    forces us to invent and maintain a TLV layer (per-field type +
    length + value framing, plus version-aware deserialization). That
    is reimplementing protobuf badly. Using protobuf from day one
    costs nothing and is the canonical answer.
  • Fixes v2 bug 1/2/3 generically: pb_size makes the wire
    self-describing, so the receiver never needs to guess the length
    or know the peer's schema version to read the body cleanly.
  • Append-only field evolution out of the box: new optional fields
    cost old receivers nothing -- they're skipped as unknown protobuf
    fields. v2 with a hand-rolled length prefix would still need
    per-field opt-in code on every side.
  • Built-in validation: ParseFromArray fails fast on malformed input;
    required-field presence is enforced at the parse layer, not by
    ad-hoc has_xxx() checks scattered through wire code.
  • bRPC already depends on protobuf -- no new build dependency.

Why a NEW MAGIC rather than a version field inside protobuf

  • Forces "breaking change" to be a deployment decision visible at
    the wire level. You cannot accidentally ship a backwards-
    incompatible patch via a field-semantics tweak.
  • Server-side dispatch routes by magic to fully independent state
    machines that can't entangle (no if (version == X) branches
    anywhere -- this is the abstraction's red line).
  • Any future breaking change bumps the magic ("RDM4", "RDM5", ...).
    v3 fields, once shipped, never change semantics.

Architecture

Handshake state machine is split out of RdmaEndpoint:

RdmaHandshake  (abstract base; only 3 virtuals + _ep pointer)
  |-- RdmaHandshakeClientV2 / ServerV2
  |-- RdmaHandshakeClientV3 / ServerV3
  |-- v2_wire::* (file-local helpers shared by v2 leaves)
  `-- v3_wire::* (file-local helpers shared by v3 leaves)

Endpoint-side ProcessHandshakeAt{Client,Server} is protocol-agnostic;
subclasses override only SendLocalHello() and ReceiveAndParseRemoteHello().
A ParsedHello struct decouples endpoint-side bring-up (window calculation,
BringUpQp) from any wire format.

Side effects:

  • Performance effects:

  • Breaking backward compatibility:

Server-side ALWAYS accepts both v2 and v3 (no gflag, no kill-switch);
magic routes to fully independent code paths. A single rolling upgrade
enables v3 fleet-wide.

Client-side picks the wire protocol via gflag with a safe default:

FLAGS_rdma_client_handshake_version  (default 2)
  2 = "RDMA" legacy (zero-regression default)
  3 = "RDM3" protobuf (opt-in once target servers support v3)

Sub-second rollback is one flag flip away. v3 client to v2-only legacy
server is NOT guaranteed to transparently fall back on the same
connection -- the supported migration is "upgrade servers first, then
opt-in clients".


Check List:

@chenBright chenBright force-pushed the rdma_handshake_v3 branch 4 times, most recently from 6e32929 to 6deded6 Compare June 2, 2026 07:12
Background
==========
The legacy v2 handshake ("RDMA" magic + 36B fixed binary HelloMessage)
had correctness bugs that made the wire format effectively
unevolvable.

v2 bugs
=======
A. Client never drained the "unknown tail" bytes when a peer sent
   msg_len > HELLO_MSG_LEN_MIN(40). Leftover bytes stayed in the
   socket recv buffer and silently corrupted the next ReadFromFd
   (the ACK).
B. Server had the symmetric version of A.
C. Server computed the body read length from its LOCAL
   g_rdma_hello_msg_len, implicitly assuming the peer's hello is the
   same length as its own. A longer peer left bytes behind; a shorter
   peer made the read block. Client used the correct compile-time
   constant; the two sides were not symmetric.

Combined, A/B/C meant v2 could not safely append a single byte to the
hello -- even an "optional hint" appended by a newer sender would
mis-align an older receiver's next read.

v3 design
=========
Wire format, magic-namespace-isolated from v2:

  [ "RDM3" 4B ][ pb_size 4B big-endian ][ RdmaHello protobuf bytes ]

with pb_size in (0, 4096]. RdmaHello carries the same 6 base fields
as v2 plus room to append future capabilities.

Why protobuf (and not "v2 plus length prefix")
----------------------------------------------
- Variable-length fields are coming. Future capability fields will
  include strings (rdma_device_name, netdev_name, ...) and other
  variable-length data. Supporting them on a fixed-binary protocol
  forces us to invent and maintain a TLV layer (per-field type +
  length + value framing, plus version-aware deserialization). That
  is reimplementing protobuf badly. Using protobuf from day one
  costs nothing and is the canonical answer.
- Fixes v2 bug A/B/C generically: pb_size makes the wire
  self-describing, so the receiver never needs to guess the length
  or know the peer's schema version to read the body cleanly.
- Append-only field evolution out of the box: new optional fields
  cost old receivers nothing -- they're skipped as unknown protobuf
  fields. v2 with a hand-rolled length prefix would still need
  per-field opt-in code on every side.
- Built-in validation: ParseFromArray fails fast on malformed input;
  required-field presence is enforced at the parse layer, not by
  ad-hoc has_xxx() checks scattered through wire code.
- bRPC already depends on protobuf -- no new build dependency.

Why a NEW MAGIC rather than a version field inside protobuf
-----------------------------------------------------------
- Forces "breaking change" to be a deployment decision visible at
  the wire level. You cannot accidentally ship a backwards-
  incompatible patch via a field-semantics tweak.
- Server-side dispatch routes by magic to fully independent state
  machines that can't entangle (no `if (version == X)` branches
  anywhere -- this is the abstraction's red line).
- Any future breaking change bumps the magic ("RDM4", "RDM5", ...).
  v3 fields, once shipped, never change semantics.

Rollout
=======
Server-side ALWAYS accepts both v2 and v3 (no gflag, no kill-switch);
magic routes to fully independent code paths. A single rolling upgrade
enables v3 fleet-wide.

Client-side picks the wire protocol via gflag with a safe default:

    FLAGS_rdma_client_handshake_version  (default 2)
      2 = "RDMA" legacy (zero-regression default)
      3 = "RDM3" protobuf (opt-in once target servers support v3)

Sub-second rollback is one flag flip away. v3 client to v2-only legacy
server is NOT guaranteed to transparently fall back on the same
connection -- the supported migration is "upgrade servers first, then
opt-in clients".
@chenBright chenBright force-pushed the rdma_handshake_v3 branch from 6deded6 to e911591 Compare June 2, 2026 08:44
@chenBright chenBright closed this Jun 2, 2026
@chenBright chenBright requested a review from Copilot June 2, 2026 12:16

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a new RDMA handshake v3 (“RDM3” + length-prefixed protobuf RdmaHello) and refactors the RDMA handshake logic into versioned handshake classes, while updating unit tests and build/CI wiring to cover both v2 and v3.

Changes:

  • Add protobuf-based v3 handshake (RDM3 magic + 4B length + RdmaHello), alongside legacy v2 support.
  • Refactor handshake logic out of RdmaEndpoint into RdmaHandshake*V2/V3 classes and share a wire-agnostic ParsedHello.
  • Expand/parameterize RDMA unit tests to exercise both handshake versions and add byte-level baseline tests; update build (CMake/Bazel) and CI dependencies for RDMA/proto.

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
test/bvar_percentile_unittest.cpp Disables a percentile test when Babylon counter is enabled.
test/brpc_rdma_unittest.cpp Updates RDMA tests for new transport API and adds v2/v3 handshake coverage, including parameterized RPC tests.
src/butil/thread_key.h Adds missing <algorithm> include.
src/brpc/socket.h Grants handshake classes friend access to Socket.
src/brpc/socket.cpp Minor formatting change to StartConnect call.
src/brpc/server.cpp Validates ServerOptions before copying to avoid ownership issues on failed Start().
src/brpc/rdma/rdma_handshake.proto Adds proto2 definition for v3 handshake message RdmaHello.
src/brpc/rdma/rdma_handshake.h Introduces handshake abstraction, v2/v3 wire helpers, and factories.
src/brpc/rdma/rdma_handshake.cpp Implements v2/v3 handshake wire parsing/serialization and version selection via gflag.
src/brpc/rdma/rdma_endpoint.h Adds handshake-related friends, new ReadFromFd(IOPortal*), WriteToFd(IOBuf*), and _handshake_version.
src/brpc/rdma/rdma_endpoint.cpp Integrates handshake abstraction; adds generalized read/write loops; adds handshake version to debug output.
src/brpc/rdma_transport.h Adjusts friend declarations and formatting for handshake integration.
CMakeLists.txt Adds rdma_handshake.proto to proto compilation list.
BUILD.bazel Includes RDMA protos in internal proto set.
.github/workflows/ci-linux.yml Improves formatting, installs RDMA deps for Bazel jobs, and adjusts Bazel test config/filters.
.github/actions/install-all-dependencies/action.yml Simplifies RDMA package installation to libibverbs-dev.
.bazelrc Adds build:rdma config to define BRPC_WITH_RDMA=true.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/brpc/rdma/rdma_endpoint.cpp
Comment thread src/brpc/rdma_transport.h
Comment thread src/brpc/rdma/rdma_handshake.cpp
Comment thread src/brpc/rdma/rdma_handshake.cpp
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants