Skip to content

fix(server): use ALPN-negotiated protocol for TLS connections#574

Open
cluster2600 wants to merge 2 commits intoNVIDIA:mainfrom
cluster2600:fix/535-alpn-tls-routing
Open

fix(server): use ALPN-negotiated protocol for TLS connections#574
cluster2600 wants to merge 2 commits intoNVIDIA:mainfrom
cluster2600:fix/535-alpn-tls-routing

Conversation

@cluster2600
Copy link

Summary

Resolves #535 — the gateway server ignored the ALPN-negotiated protocol on TLS connections, falling back to byte-sniffing auto-detection which could misidentify HTTP/2 connections as HTTP/1.1, rendering gRPC unusable.

  • Add MultiplexService::serve_h2() which starts the HTTP/2 state machine directly, bypassing auto-detection
  • After TLS handshake, check the ALPN-negotiated protocol and route h2 connections to serve_h2()
  • Add TLS status logging on startup (cert paths, ALPN config) and per-connection ALPN protocol logging at debug level

Root Cause

Hyper's auto-detection works by reading the first bytes of a connection and checking for the HTTP/2 connection preface (PRI * HTTP/2.0). On TLS connections, if the first read() returns a partial preface (due to buffering, MTU fragmentation, or TLS record boundaries), the auto-detector falls through to HTTP/1.1 — even though ALPN already negotiated h2 during the TLS handshake.

graph TD
    subgraph "Before (broken)"
        A[TLS handshake<br/>ALPN negotiates h2] --> B[hyper auto-detect]
        B -->|"partial preface read"| C[HTTP/1.1 assumed]
        B -->|"full preface read"| D[HTTP/2]
        C --> E["gRPC fails ❌"]
    end

    subgraph "After (fixed)"
        F[TLS handshake<br/>ALPN negotiates h2] --> G{ALPN protocol?}
        G -->|"h2"| H["serve_h2() — direct HTTP/2"]
        G -->|"other/none"| I[hyper auto-detect fallback]
        H --> J["gRPC works ✅"]
    end
Loading

Changes

File Change
crates/openshell-server/src/lib.rs Check ALPN result after TLS accept; route h2 to serve_h2(); add TLS startup logging
crates/openshell-server/src/multiplex.rs Add ALPN_H2 constant; implement serve_h2() using hyper::server::conn::http2::Builder directly
crates/openshell-server/src/tls.rs Add unit tests for TLS error paths (from_files_rejects_missing_cert, load_certs_rejects_nonexistent_file, load_key_rejects_nonexistent_file)

Test Plan

  • cargo test -p openshell-server --lib — 199 passed, 0 failed
  • New test: alpn_h2_constant_matches_standard_protocol_id — verifies the ALPN_H2 constant equals b"h2"
  • New tests: 3 TLS error-path tests in tls.rs
sequenceDiagram
    participant Client
    participant TLS as TLS Acceptor
    participant Server as MultiplexService

    Client->>TLS: ClientHello (ALPN: h2, http/1.1)
    TLS->>Client: ServerHello (ALPN: h2)
    TLS->>Server: tls_stream + ALPN = "h2"

    alt ALPN == h2
        Server->>Server: serve_h2(tls_stream)
        Note over Server: HTTP/2 state machine starts immediately
    else ALPN != h2 or absent
        Server->>Server: serve(tls_stream)
        Note over Server: byte-sniffing auto-detection
    end

    Client->>Server: gRPC request (HTTP/2 POST)
    Server->>Client: gRPC response
Loading

@cluster2600 cluster2600 requested a review from a team as a code owner March 24, 2026 20:56
@drew drew self-assigned this Mar 25, 2026
@github-actions
Copy link

github-actions bot commented Mar 25, 2026

All contributors have signed the DCO ✍️ ✅
Posted by the DCO Assistant Lite bot.

@github-actions
Copy link

Thank you for your interest in contributing to OpenShell, @cluster2600.

This project uses a vouch system for first-time contributors. Before submitting a pull request, you need to be vouched by a maintainer.

To get vouched:

  1. Open a Vouch Request discussion.
  2. Describe what you want to change and why.
  3. Write in your own words — do not have an AI generate the request.
  4. A maintainer will comment /vouch if approved.
  5. Once vouched, open a new PR (preferred) or reopen this one after a few minutes.

See CONTRIBUTING.md for details.

@github-actions github-actions bot closed this Mar 25, 2026
@drew drew reopened this Mar 25, 2026
@cluster2600
Copy link
Author

I have read the DCO document and I hereby sign the DCO.

@cluster2600
Copy link
Author

Hi there — just a quick note to say we've done the needful on our end. The DCO is signed, and we've opened a vouch request as well.

Happy to address any feedback on the implementation. The fix has been compiled and tested on our DGX Spark (aarch64) — 199 server lib tests passing, no regressions.

Cheers!

@cluster2600
Copy link
Author

Remote validation on the DGX host after the rustfmt fix:

  • cargo fmt --all -- --check: passes
  • cargo test -p openshell-server --lib: passes (199 passed)
  • cargo test -p openshell-server --test multiplex_tls_integration: passes (4 passed)
  • cargo clippy --workspace --all-targets: passes with warnings only
  • cargo test --workspace: fails in crates/openshell-cli/tests/sandbox_create_lifecycle_integration.rs because port 8080 is already in use on the host

The ALPN/TLS routing change looks sound on the DGX box. The remaining workspace failure appears environmental rather than related to this change.

The gateway server used hyper's auto-detection (byte-sniffing the HTTP/2
connection preface) for all connections, including TLS connections where
ALPN already negotiated the protocol. This could misidentify h2
connections as HTTP/1.1 when the first read returned a partial preface,
making gRPC unusable.

Add MultiplexService::serve_h2() which starts the HTTP/2 state machine
directly, bypassing auto-detection. After TLS handshake, check the
ALPN-negotiated protocol and route h2 connections to serve_h2().
Plaintext connections continue using auto-detection.

Also add TLS status logging (cert paths, ALPN config) on startup and
per-connection ALPN protocol logging at debug level.

Closes NVIDIA#535
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

v0.0.13 gateway server ignores TLS config, starts plaintext HTTP/1.1 only — gRPC unusable

3 participants