fix(server): use ALPN-negotiated protocol for TLS connections by cluster2600 · Pull Request #574 · NVIDIA/OpenShell

cluster2600 · 2026-03-24T20:56:47Z

Summary

Resolves #535 — the gateway server ignored the ALPN-negotiated protocol on TLS connections, falling back to byte-sniffing auto-detection which could misidentify HTTP/2 connections as HTTP/1.1, rendering gRPC unusable.

Add MultiplexService::serve_h2() which starts the HTTP/2 state machine directly, bypassing auto-detection
After TLS handshake, check the ALPN-negotiated protocol and route h2 connections to serve_h2()
Add TLS status logging on startup (cert paths, ALPN config) and per-connection ALPN protocol logging at debug level

Root Cause

Hyper's auto-detection works by reading the first bytes of a connection and checking for the HTTP/2 connection preface (PRI * HTTP/2.0). On TLS connections, if the first read() returns a partial preface (due to buffering, MTU fragmentation, or TLS record boundaries), the auto-detector falls through to HTTP/1.1 — even though ALPN already negotiated h2 during the TLS handshake.

graph TD
    subgraph "Before (broken)"
        A[TLS handshake<br/>ALPN negotiates h2] --> B[hyper auto-detect]
        B -->|"partial preface read"| C[HTTP/1.1 assumed]
        B -->|"full preface read"| D[HTTP/2]
        C --> E["gRPC fails ❌"]
    end

    subgraph "After (fixed)"
        F[TLS handshake<br/>ALPN negotiates h2] --> G{ALPN protocol?}
        G -->|"h2"| H["serve_h2() — direct HTTP/2"]
        G -->|"other/none"| I[hyper auto-detect fallback]
        H --> J["gRPC works ✅"]
    end

Changes

File	Change
`crates/openshell-server/src/lib.rs`	Check ALPN result after TLS accept; route `h2` to `serve_h2()`; add TLS startup logging
`crates/openshell-server/src/multiplex.rs`	Add `ALPN_H2` constant; implement `serve_h2()` using `hyper::server::conn::http2::Builder` directly
`crates/openshell-server/src/tls.rs`	Add unit tests for TLS error paths (`from_files_rejects_missing_cert`, `load_certs_rejects_nonexistent_file`, `load_key_rejects_nonexistent_file`)

Test Plan

cargo test -p openshell-server --lib — 199 passed, 0 failed
New test: alpn_h2_constant_matches_standard_protocol_id — verifies the ALPN_H2 constant equals b"h2"
New tests: 3 TLS error-path tests in tls.rs

sequenceDiagram
    participant Client
    participant TLS as TLS Acceptor
    participant Server as MultiplexService

    Client->>TLS: ClientHello (ALPN: h2, http/1.1)
    TLS->>Client: ServerHello (ALPN: h2)
    TLS->>Server: tls_stream + ALPN = "h2"

    alt ALPN == h2
        Server->>Server: serve_h2(tls_stream)
        Note over Server: HTTP/2 state machine starts immediately
    else ALPN != h2 or absent
        Server->>Server: serve(tls_stream)
        Note over Server: byte-sniffing auto-detection
    end

    Client->>Server: gRPC request (HTTP/2 POST)
    Server->>Client: gRPC response

github-actions · 2026-03-25T06:48:48Z

All contributors have signed the DCO ✍️ ✅
_{Posted by the DCO Assistant Lite bot.}

github-actions · 2026-03-25T06:48:48Z

Thank you for your interest in contributing to OpenShell, @cluster2600.

This project uses a vouch system for first-time contributors. Before submitting a pull request, you need to be vouched by a maintainer.

To get vouched:

Open a Vouch Request discussion.
Describe what you want to change and why.
Write in your own words — do not have an AI generate the request.
A maintainer will comment /vouch if approved.
Once vouched, open a new PR (preferred) or reopen this one after a few minutes.

See CONTRIBUTING.md for details.

cluster2600 · 2026-03-25T07:40:42Z

I have read the DCO document and I hereby sign the DCO.

cluster2600 · 2026-03-25T07:42:22Z

Hi there — just a quick note to say we've done the needful on our end. The DCO is signed, and we've opened a vouch request as well.

Happy to address any feedback on the implementation. The fix has been compiled and tested on our DGX Spark (aarch64) — 199 server lib tests passing, no regressions.

Cheers!

cluster2600 · 2026-03-25T15:25:22Z

Remote validation on the DGX host after the rustfmt fix:

cargo fmt --all -- --check: passes
cargo test -p openshell-server --lib: passes (199 passed)
cargo test -p openshell-server --test multiplex_tls_integration: passes (4 passed)
cargo clippy --workspace --all-targets: passes with warnings only
cargo test --workspace: fails in crates/openshell-cli/tests/sandbox_create_lifecycle_integration.rs because port 8080 is already in use on the host

The ALPN/TLS routing change looks sound on the DGX box. The remaining workspace failure appears environmental rather than related to this change.

The gateway server used hyper's auto-detection (byte-sniffing the HTTP/2 connection preface) for all connections, including TLS connections where ALPN already negotiated the protocol. This could misidentify h2 connections as HTTP/1.1 when the first read returned a partial preface, making gRPC unusable. Add MultiplexService::serve_h2() which starts the HTTP/2 state machine directly, bypassing auto-detection. After TLS handshake, check the ALPN-negotiated protocol and route h2 connections to serve_h2(). Plaintext connections continue using auto-detection. Also add TLS status logging (cert paths, ALPN config) on startup and per-connection ALPN protocol logging at debug level. Closes NVIDIA#535

cluster2600 requested a review from a team as a code owner March 24, 2026 20:56

drew self-assigned this Mar 25, 2026

github-actions bot closed this Mar 25, 2026

drew reopened this Mar 25, 2026

cluster2600 added 2 commits March 25, 2026 16:28

style(server): apply rustfmt to TLS routing change

a97dd49

cluster2600 force-pushed the fix/535-alpn-tls-routing branch from 4f7656b to a97dd49 Compare March 25, 2026 15:29

drew assigned johntmyers Mar 25, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(server): use ALPN-negotiated protocol for TLS connections#574

fix(server): use ALPN-negotiated protocol for TLS connections#574
cluster2600 wants to merge 2 commits intoNVIDIA:mainfrom
cluster2600:fix/535-alpn-tls-routing

cluster2600 commented Mar 24, 2026

Uh oh!

github-actions bot commented Mar 25, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 25, 2026

Uh oh!

cluster2600 commented Mar 25, 2026

Uh oh!

cluster2600 commented Mar 25, 2026

Uh oh!

cluster2600 commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

cluster2600 commented Mar 24, 2026

Summary

Root Cause

Changes

Test Plan

Uh oh!

github-actions bot commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 25, 2026

Uh oh!

cluster2600 commented Mar 25, 2026

Uh oh!

cluster2600 commented Mar 25, 2026

Uh oh!

cluster2600 commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions bot commented Mar 25, 2026 •

edited

Loading