WaffleBits · WaffleBits · Jun 23, 2026 · Jun 23, 2026
diff --git a/README.md b/README.md
@@ -16,6 +16,8 @@ replayable scheduling traces, and canary/shadow release decisions.
 - Deterministic workload replay with a machine-readable trace fingerprint.
 - Baseline/candidate release validation with `promote`, `hold`, and `rollback`
   outcomes.
+- Backend mirror normalization for vLLM/SGLang-style serving observations
+  before the release gate runs.
 - Exact output checks, model-aware numeric tolerances for backend drift,
   per-segment release summaries, error-rate deltas, p95 latency regression
   policy, tests, and CI.
@@ -40,12 +42,18 @@ cargo run --release -- gate \
 cargo run --release -- gate \
   --input fixtures/release_gate_numeric_tolerance.json \
   --output artifacts/release-gate-numeric-tolerance.json
+
+cargo run --release -- mirror-gate \
+  --input fixtures/backend_mirror_vllm_sglang.json \
+  --output artifacts/backend-mirror-report.json
 ```
 
 The safe fixture produces `promote`. The candidate with an output mismatch and
 an added error produces `rollback`.
 The numeric-tolerance fixture produces `promote` while reporting four tolerated
 numeric comparisons across a baseline-runtime to candidate-runtime segment.
+The backend-mirror fixture converts vLLM/SGLang-style request observations into
+the same release gate and produces `promote` with a vLLM to SGLang segment.
 
 The checked workload fixture completes four requests in 11 scheduler ticks,
 peaks at 12 of 20 KV pages, returns all pages on completion, and emits trace
@@ -69,6 +77,18 @@ Every tick records:
 The replay report includes a stable trace fingerprint, peak KV pages, total
 ticks, and completion count.
 
+## Backend Mirror Adapter
+
+`runtime-lab mirror` converts backend-specific mirrored observations into a
+gate input. `runtime-lab mirror-gate` performs the conversion and immediately
+evaluates the release policy.
+
+The adapter accepts per-request latency, health, model, backend, accelerator,
+output token IDs, explicit output fingerprints, and optional numeric output
+vectors. Successful observations must carry output material so correctness
+checks remain auditable. Token IDs and numeric vectors are converted into
+stable FNV-1a fingerprints when an engine-specific fingerprint is not supplied.
+
 ## Release Policy
 
 The gate joins mirrored baseline and candidate observations by request ID.

diff --git a/artifacts/backend-mirror-report.json b/artifacts/backend-mirror-report.json
@@ -0,0 +1,38 @@
+{
+  "schema_version": 2,
+  "decision": "promote",
+  "matched_requests": 4,
+  "baseline_requests": 4,
+  "candidate_requests": 4,
+  "coverage_rate": 1.0,
+  "output_mismatch_rate": 0.0,
+  "numeric_pairs": 0,
+  "tolerated_numeric_outputs": 0,
+  "numeric_drift_rate": 0.0,
+  "max_numeric_abs_error": null,
+  "max_numeric_rel_error": null,
+  "baseline_error_rate": 0.0,
+  "candidate_error_rate": 0.0,
+  "error_rate_increase": 0.0,
+  "baseline_p95_latency_ms": 28.0,
+  "candidate_p95_latency_ms": 27.2,
+  "p95_latency_regression_pct": -2.857143,
+  "segments": [
+    {
+      "model": "decoder-7b",
+      "baseline_backend": "vllm",
+      "candidate_backend": "sglang",
+      "accelerator": "h100",
+      "matched_requests": 4,
+      "output_mismatch_rate": 0.0,
+      "baseline_error_rate": 0.0,
+      "candidate_error_rate": 0.0,
+      "baseline_p95_latency_ms": 28.0,
+      "candidate_p95_latency_ms": 27.2,
+      "p95_latency_regression_pct": -2.857143
+    }
+  ],
+  "reasons": [
+    "candidate stayed within correctness, reliability, and latency policy"
+  ]
+}
diff --git a/docs/ARCHITECTURE.md b/docs/ARCHITECTURE.md
@@ -50,3 +50,17 @@ within policy produces `promote`.
 This is a local validation component, not a deployment controller. Production
 integration would obtain observations from mirrored traffic, canary
 populations, telemetry, and an audited rollout system.
+
+## Backend Mirror Adapter
+
+The adapter sits before the release gate. It normalizes backend-specific
+mirrored observations into `GateInput` without changing the gate policy. This
+keeps ingestion concerns separate from rollout decisions.
+
+The adapter currently accepts compact vLLM/SGLang-style request summaries:
+request ID, latency, health, model, backend, accelerator, output token IDs,
+optional explicit fingerprints, and optional numeric vectors. If an engine does
+not provide a fingerprint, the adapter computes a stable FNV-1a fingerprint
+from token IDs or numeric values. Successful observations without output
+material are rejected so a candidate cannot be promoted from latency-only
+evidence.
diff --git a/docs/RELEASE_VALIDATION.md b/docs/RELEASE_VALIDATION.md
@@ -45,6 +45,20 @@ The report includes:
 - segment summaries by model, baseline backend, candidate backend, and
   accelerator.
 
+## Backend Mirror Adapter
+
+The `mirror` command normalizes request observations from backend-specific
+serving traces into the release gate input format. It is intended for mirrored
+baseline/candidate comparisons such as vLLM versus SGLang, or a current
+runtime versus a candidate runtime behind shadow traffic.
+
+Each observation records request ID, latency, health, model, backend,
+accelerator, and output material. Engines may provide their own
+`output_fingerprint`; otherwise the adapter hashes output token IDs or numeric
+output vectors with a stable FNV-1a fingerprint. Successful observations
+without output material are rejected because the release gate cannot audit
+correctness from latency alone.
+
 ## Production Extension Points
 
 A real rollout system should add:

diff --git a/fixtures/backend_mirror_vllm_sglang.json b/fixtures/backend_mirror_vllm_sglang.json
@@ -0,0 +1,72 @@
+{
+  "thresholds": {
+    "min_matched_requests": 4,
+    "max_output_mismatch_rate": 0.0,
+    "max_error_rate_increase": 0.01,
+    "max_p95_latency_regression_pct": 10.0,
+    "max_numeric_drift_rate": 0.0,
+    "numeric_tolerances": []
+  },
+  "baseline": {
+    "backend": "vllm",
+    "model": "decoder-7b",
+    "accelerator": "h100",
+    "observations": [
+      {
+        "request_id": "prompt-a",
+        "latency_ms": 18.0,
+        "ok": true,
+        "output_token_ids": [101, 1402, 13]
+      },
+      {
+        "request_id": "prompt-b",
+        "latency_ms": 21.0,
+        "ok": true,
+        "output_token_ids": [205, 778, 990]
+      },
+      {
+        "request_id": "prompt-c",
+        "latency_ms": 24.0,
+        "ok": true,
+        "output_token_ids": [42, 42, 7]
+      },
+      {
+        "request_id": "prompt-d",
+        "latency_ms": 28.0,
+        "ok": true,
+        "output_token_ids": [301, 302, 303, 2]
+      }
+    ]
+  },
+  "candidate": {
+    "backend": "sglang",
+    "model": "decoder-7b",
+    "accelerator": "h100",
+    "observations": [
+      {
+        "request_id": "prompt-a",
+        "latency_ms": 17.5,
+        "ok": true,
+        "output_token_ids": [101, 1402, 13]
+      },
+      {
+        "request_id": "prompt-b",
+        "latency_ms": 20.6,
+        "ok": true,
+        "output_token_ids": [205, 778, 990]
+      },
+      {
+        "request_id": "prompt-c",
+        "latency_ms": 23.5,
+        "ok": true,
+        "output_token_ids": [42, 42, 7]
+      },
+      {
+        "request_id": "prompt-d",
+        "latency_ms": 27.2,
+        "ok": true,
+        "output_token_ids": [301, 302, 303, 2]
+      }
+    ]
+  }
+}
diff --git a/src/adapter.rs b/src/adapter.rs
@@ -0,0 +1,178 @@
+use std::error::Error;
+use std::fmt;
+
+use serde::{Deserialize, Serialize};
+
+use crate::release::{GateInput, GateThresholds, Observation};
+
+#[derive(Debug, Clone, PartialEq, Serialize, Deserialize)]
+pub struct BackendMirrorInput {
+    #[serde(default)]
+    pub thresholds: GateThresholds,
+    pub baseline: BackendObservationSet,
+    pub candidate: BackendObservationSet,
+}
+
+#[derive(Debug, Clone, PartialEq, Serialize, Deserialize)]
+pub struct BackendObservationSet {
+    pub backend: String,
+    pub model: String,
+    #[serde(default)]
+    pub accelerator: Option<String>,
+    pub observations: Vec<BackendObservation>,
+}
+
+#[derive(Debug, Clone, PartialEq, Serialize, Deserialize)]
+pub struct BackendObservation {
+    pub request_id: String,
+    pub latency_ms: f64,
+    #[serde(default)]
+    pub ok: Option<bool>,
+    #[serde(default)]
+    pub output_fingerprint: Option<String>,
+    #[serde(default)]
+    pub output_token_ids: Vec<i64>,
+    #[serde(default)]
+    pub output_values: Option<Vec<f64>>,
+    #[serde(default)]
+    pub error: Option<String>,
+}
+
+#[derive(Debug, Clone, PartialEq, Eq)]
+pub struct AdapterError {
+    message: String,
+}
+
+impl AdapterError {
+    fn new(message: impl Into<String>) -> Self {
+        Self {
+            message: message.into(),
+        }
+    }
+}
+
+impl fmt::Display for AdapterError {
+    fn fmt(&self, formatter: &mut fmt::Formatter<'_>) -> fmt::Result {
+        formatter.write_str(&self.message)
+    }
+}
+
+impl Error for AdapterError {}
+
+pub fn mirror_to_gate_input(input: &BackendMirrorInput) -> Result<GateInput, AdapterError> {
+    Ok(GateInput {
+        thresholds: input.thresholds.clone(),
+        baseline: normalize_set(&input.baseline)?,
+        candidate: normalize_set(&input.candidate)?,
+    })
+}
+
+fn normalize_set(set: &BackendObservationSet) -> Result<Vec<Observation>, AdapterError> {
+    if set.backend.trim().is_empty() {
+        return Err(AdapterError::new("backend must not be empty"));
+    }
+    if set.model.trim().is_empty() {
+        return Err(AdapterError::new("model must not be empty"));
+    }
+
+    set.observations
+        .iter()
+        .map(|observation| normalize_observation(set, observation))
+        .collect()
+}
+
+fn normalize_observation(
+    set: &BackendObservationSet,
+    observation: &BackendObservation,
+) -> Result<Observation, AdapterError> {
+    if observation.request_id.trim().is_empty() {
+        return Err(AdapterError::new("request_id must not be empty"));
+    }
+    if !observation.latency_ms.is_finite() || observation.latency_ms < 0.0 {
+        return Err(AdapterError::new(format!(
+            "request {} has invalid latency_ms",
+            observation.request_id
+        )));
+    }
+
+    let ok = observation.ok.unwrap_or_else(|| {
+        observation
+            .error
+            .as_ref()
+            .is_none_or(|error| error.trim().is_empty())
+    });
+    let output_fingerprint = output_fingerprint(observation, ok)?;
+
+    Ok(Observation {
+        request_id: observation.request_id.clone(),
+        output_fingerprint,
+        latency_ms: observation.latency_ms,
+        ok,
+        model: Some(set.model.clone()),
+        backend: Some(set.backend.clone()),
+        accelerator: set.accelerator.clone(),
+        output_values: observation.output_values.clone(),
+    })
+}
+
+fn output_fingerprint(observation: &BackendObservation, ok: bool) -> Result<String, AdapterError> {
+    if let Some(fingerprint) = observation.output_fingerprint.as_ref()
+        && !fingerprint.trim().is_empty()
+    {
+        return Ok(fingerprint.clone());
+    }
+
+    if !observation.output_token_ids.is_empty() {
+        return Ok(format!(
+            "tokens-fnv64:{:016x}",
+            hash_i64_values(&observation.output_token_ids)
+        ));
+    }
+
+    if let Some(values) = observation.output_values.as_ref()
+        && !values.is_empty()
+    {
+        return Ok(format!("values-fnv64:{:016x}", hash_f64_values(values)));
+    }
+
+    if !ok {
+        return Ok("error".into());
+    }
+
+    Err(AdapterError::new(format!(
+        "request {} is successful but has no output fingerprint, token ids, or numeric values",
+        observation.request_id
+    )))
+}
+
+fn hash_i64_values(values: &[i64]) -> u64 {
+    let mut hash = FNV_OFFSET_BASIS;
+    feed_usize(&mut hash, values.len());
+    for value in values {
+        feed_bytes(&mut hash, &value.to_le_bytes());
+    }
+    hash
+}
+
+fn hash_f64_values(values: &[f64]) -> u64 {
+    let mut hash = FNV_OFFSET_BASIS;
+    feed_usize(&mut hash, values.len());
+    for value in values {
+        feed_bytes(&mut hash, &value.to_bits().to_le_bytes());
+    }
+    hash
+}
+
+const FNV_OFFSET_BASIS: u64 = 0xcbf2_9ce4_8422_2325;
+const FNV_PRIME: u64 = 0x0000_0100_0000_01b3;
+
+fn feed_usize(hash: &mut u64, value: usize) {
+    feed_bytes(hash, &value.to_le_bytes());
+}
+
+fn feed_bytes(hash: &mut u64, bytes: &[u8]) {
+    for byte in bytes {
+        *hash ^= u64::from(*byte);
+        *hash = hash.wrapping_mul(FNV_PRIME);
+    }
+}
diff --git a/src/lib.rs b/src/lib.rs
@@ -1,6 +1,11 @@
+pub mod adapter;
 pub mod release;
 pub mod scheduler;
 
+pub use adapter::{
+    AdapterError, BackendMirrorInput, BackendObservation, BackendObservationSet,
+    mirror_to_gate_input,
+};
 pub use release::{
     GateDecision, GateInput, GateReport, GateThresholds, NumericTolerance, Observation,
     SegmentReport, evaluate_release,