Skip to content

Support dynamic client certificate resolution (ResolvesClientCert)#1340

Open
brucearctor wants to merge 4 commits into
temporalio:mainfrom
brucearctor:feat/dynamic-client-certs
Open

Support dynamic client certificate resolution (ResolvesClientCert)#1340
brucearctor wants to merge 4 commits into
temporalio:mainfrom
brucearctor:feat/dynamic-client-certs

Conversation

@brucearctor

@brucearctor brucearctor commented Jun 21, 2026

Copy link
Copy Markdown
Contributor

Support dynamic client certificate resolution (ResolvesClientCert)

What

Add a client_cert_resolver field to TlsOptions that accepts an Arc<dyn ResolvesClientCert> for dynamic, per-handshake client certificate selection. This enables transparent mTLS certificate rotation without requiring a process restart — useful for short-lived certificates managed by Vault agents, cert-manager sidecars, or HSM-backed signers.

Closes #1338

Why

The existing client_tls_options requires static (cert, key) bytes at connection time. Users with rotating certificates (e.g., Vault-issued certs with 24h TTLs) must restart the entire Temporal client to pick up new material. The Go SDK solves this with tls.Config.GetClientCertificate; this PR brings equivalent functionality to the Rust SDK.

How

Since tonic 0.14.6 does not expose rustls::ClientConfig::with_client_cert_resolver(), the implementation bypasses tonic's TLS layer when a resolver is present:

  1. build_custom_rustls_config() — manually constructs a rustls::ClientConfig with the user's ResolvesClientCert, replicating tonic's security defaults (protocol versions, ALPN, native/custom CA roots)
  2. DynamicTlsConnector — a tower::Service<Uri> that wraps TCP with TLS using tokio_rustls::TlsConnector, including connect timeouts and tracing
  3. Endpoint::connect_with_connector() — used instead of connect() to wire the custom connector into tonic's channel

The resolver fires on each new TLS handshake (reconnections), not per-RPC over an existing HTTP/2 connection.

Key design decisions

  • Mutually exclusive with static certs — setting both client_tls_options and client_cert_resolver returns InvalidConfig
  • Re-exportsResolvesClientCert, CertifiedKey, and SignatureScheme are re-exported at the crate root so users don't need to depend on tokio-rustls directly
  • DNS load balancing — returns a clear error (not yet supported with dynamic certs due to balance_channel API constraints)
  • Proxy — returns a clear error (not yet supported; would require composing connectors)
  • C bridge — hardcodes client_cert_resolver: None (dynamic resolution from C callers needs a callback design in a follow-up)

Changes

File Change
crates/client/src/options_structs.rs Add client_cert_resolver field, update Debug impl
crates/client/src/lib.rs TlsConfigResult enum, add_tls_to_channel branching, build_custom_rustls_config, DynamicTlsConnector, proxy/DNS validation, tracing
crates/client/src/dns.rs Handle TlsConfigResult, reject resolver + DNS LB
crates/client/src/envconfig.rs Add client_cert_resolver: None
crates/client/Cargo.toml Add rustls-native-certs dep
crates/sdk-core-c-bridge/src/client.rs Add client_cert_resolver: None
crates/sdk-core/tests/common/mod.rs Update test struct literals
CHANGELOG.md Add entry under [Unreleased] > Added

Tests (15 new, 134 total pass)

  • Custom connector returned when resolver is set (explicit domain + inferred domain)
  • Mutual exclusion: static + dynamic certs → error
  • Resolver + custom ServerCertVerifier combination
  • Resolver + custom CA certificate
  • build_custom_rustls_config with/without resolver/verifier
  • Debug output correctness for TlsOptions with resolver
  • DynamicTlsConnector is Clone + Debug
  • Default TlsOptions has no resolver
  • No-TLS passthrough returns Standard
  • IP host fallback for domain extraction
  • Re-exports compile (CertifiedKey, SignatureScheme)

Future work

…ClientCert)

Add a new client_cert_resolver field to TlsOptions that accepts an
Arc<dyn ResolvesClientCert> for dynamic per-handshake client certificate
resolution. This enables transparent mTLS certificate rotation without
requiring a process restart -- useful for short-lived certificates
managed by Vault agents, sidecars, or HSMs.

Key changes:
- Add client_cert_resolver: Option<Arc<dyn ResolvesClientCert>> to
  TlsOptions (mutually exclusive with static client_tls_options)
- Re-export ResolvesClientCert from the crate root for ergonomic use
- When a resolver is set, bypass tonic's static Identity path and build
  a rustls::ClientConfig manually with with_client_cert_resolver()
- Introduce DynamicTlsConnector (tower::Service<Uri>) that performs TLS
  via tokio_rustls::TlsConnector with the custom config
- Use Endpoint::connect_with_connector() to wire it into tonic's channel
- Add validation: error when both static and dynamic client certs are set
- DNS load balancing returns a clear error for now (not yet supported
  with dynamic cert resolution)
- Update C bridge and envconfig with client_cert_resolver: None

Tests (13 new):
- Mutual exclusion validation (static + dynamic = error)
- CustomConnector result with explicit and inferred domain
- Resolver + custom ServerCertVerifier combination
- Resolver + custom CA certificate
- build_custom_rustls_config with/without resolver/verifier
- Debug output for TlsOptions with resolver
- DynamicTlsConnector Clone requirement
- Default TlsOptions has no resolver
- No-TLS passthrough still returns Standard

Closes: temporalio#1338
Fixes from 4-reviewer deep code review (Temporal, Rust, Systems, Security):

Must Fix:
- Proxy + cert resolver: add validation to prevent silent discard of
  resolver when http_connect_proxy is also set (was silently ignoring
  the dynamic cert resolver)
- Empty SNI domain: fail early with clear error instead of producing
  an empty string that causes a confusing late error from ServerName
- Localhost fallback: DynamicTlsConnector no longer falls back to
  "localhost" when URI has no host — returns a clear error instead
- Connect timeout: wrap TCP connect in 30s timeout to prevent hanging
  on unreachable hosts (tonic's built-in connector handles this but
  custom connectors must do it themselves)
- CA cert parsing: check roots.is_empty() after add_parsable_certificates
  to catch cases where all provided CA certs are malformed
- Dead code: remove unused CountingCertResolver test helper

Should Fix:
- Tracing: add debug! logging to DynamicTlsConnector for TCP connect
  and TLS handshake completion
- Re-exports: add CertifiedKey and SignatureScheme re-exports so users
  can implement ResolvesClientCert without adding tokio-rustls directly
- Debug impl: add manual Debug for DynamicTlsConnector showing domain
- Nodelay comment: document why set_nodelay(true) is set
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature Request] Support dynamic/reloadable client certificates (ResolvesClientCert) in TlsOptions

1 participant