neuron7xLab · neuron7xLab · Jun 25, 2026 · Jun 24, 2026 · Jun 24, 2026 · Jun 24, 2026
@@ -68,6 +68,7 @@ jobs:
       - run: python tools/generate_current_truth.py --check
       - run: python tools/validate_current_truth.py
       - run: python tools/validate_forbidden_claims.py
+      - run: python tools/validate_statistical_claims.py
       - run: python tools/validate_release_notes.py
       - run: python tools/validate_open_source_readiness.py
       - run: python tools/check_github_actions_policy.py

@@ -43,6 +43,7 @@ jobs:
       - run: uv run --no-sync python tools/generate_current_truth.py --check
       - run: uv run --no-sync python tools/validate_current_truth.py
       - run: uv run --no-sync python tools/validate_forbidden_claims.py
+      - run: uv run --no-sync python tools/validate_statistical_claims.py
       - run: uv run --no-sync python tools/validate_artifact_schema.py
       - run: uv run --no-sync python tools/update_status.py --check
       - run: uv run --no-sync python tools/generate_manifest.py --check

@@ -54,6 +54,6 @@ Self-conformance (`tools/run_contract_conformance.py`): **PARTIAL** —
 
 ## State
 
-- tests: **525** (generated by `tools/update_status.py`)
+- tests: **527** (generated by `tools/update_status.py`)
 - full evidence: `CLAIM_AUDIT.md`, `EVIDENCE_INDEX.md`, `docs/HONESTY_AUTOMATION.md`
 - nothing here is "true": the ceiling is *survived falsification under stated conditions*.
@@ -5,12 +5,22 @@ Canonical machine-readable truth: [`artifacts/release/CURRENT_TRUTH.json`](artif
 This document must agree with it (enforced by `tools/validate_current_truth.py`).
 
 ## 1. Current canonical verdict
-**BONN_S2_BRIGHT_LINE_PASSED.** BSFF passed the Bonn S2 bright-line under the frozen
-finite-N-corrected SampEn protocol (`S2-C1-sampen-finiteN`).
+**`BONN_S2_BRIGHT_LINE_ROBUSTLY_PASSED`.** The bright line passes the full PI-grade gauntlet:
+falsification → seed-averaged confirmation → byte-for-byte reproduction → multi-null robustness.
+G1 power 0.94 (seed-averaged, robust). G2 specificity is robust to **both** seed and null-model
+choice: the pre-registered **S3 seed-averaged AR-null** test (N=1000, 10 seeds, frozen lock before
+run, re-run reproduced byte-for-byte) gives FPR **0.028**, Wilson 95% CI **[0.019, 0.040]**; and the
+**multi-null** gate (`MULTI_NULL_ROBUSTNESS.json`) holds across all three independent linear-null
+families — AR 0.026 [0.018, 0.038], IAAFT 0.032 [0.023, 0.045], phase-randomized 0.034 [0.024, 0.047]
+— every Wilson CI-upper ≤ 0.05. `robust_gate_passed = true`. This survived (and superseded) a
+smaller-N calibration that had flagged the estimate as seed-set/N sensitive near the boundary.
 
-- G1 (power): Set E SURVIVED **0.96**, Set A not-SURVIVED **0.92**, Set B not-SURVIVED **0.92** (≥ 0.80).
-- G2 (specificity): real-spectrum AR-null FPR A **0.02**, B **0.02**, combined **0.02** (≤ 0.05).
-- BNCI2014-001 chain: **UNLOCKED_FOR_PREREGISTRATION_ONLY**.
+- G1 (power): Set E SURVIVED **0.94** seed-averaged (≥ 0.80) — **robust**.
+- G2 (specificity): seed-averaged AR-null FPR **0.028** [0.019, 0.040]; multi-null all ≤ 0.05 — **robust**.
+- `multi_null_robustness_state = PASSED` (AR / IAAFT / phase-randomized).
+- BNCI2014-001 chain: **UNLOCKED_FOR_PREREGISTRATION_ONLY** (execution not valid for narrowband epochs).
+- Still NOT: clinical/regulatory; BNCI executed; multi-dataset replicated.
+- `CURRENT_TRUTH.bonn_s2_robustness_state = SEED_ROBUST_AR_NULL_PASS ... MULTINULL_PENDING`.
 
 > BSFF passed the Bonn S2 bright-line under the frozen finite-N-corrected SampEn protocol.
 > This permits BNCI2014-001 preregistration. It does not validate BSFF across BCI datasets,
@@ -56,3 +66,12 @@ adjudicated on its own executed evidence.
 S2_BRIGHT_LINE_SUMMARY, s2_CONFIRMATORY_VERDICT, S2_SELECTION_LOCK, DATASET_MANIFEST}.json` ·
 `docs/validation/{S2_VERDICT, STATISTIC_REGISTRY, CLAIM_AUDIT}.md` · hashes
 `artifacts/release/bonn_bright_line/HASHES.sha256` · reproduce `REPRODUCE.md`.
+
+## Robustness (falsification-calibrated)
+An adversarial battery (`artifacts/bonn_bright_line/S2_FALSIFICATION_REPORT.json`) found:
+**G1 power is robust** (Set E SURVIVED 0.967 under all seeds/AR-orders), but **G2 specificity is a
+boundary pass** — AR-null FPR reached **0.067 > 0.05** under one perturbation seed (N=30). So
+`BONN_S2_BRIGHT_LINE_PASSED` is a **marginal/boundary** pass: it cleared the predeclared N=100
+confirmatory (FPR 0.02) but the specificity margin is thin and seed-sensitive. Not claimed as
+robustly crossed; a seed-averaged / larger-N specificity confirmatory is the honest next step.
+`CURRENT_TRUTH.s2_robustness = BOUNDARY_PASS_G1_POWER_ROBUST_G2_SPECIFICITY_SEED_SENSITIVE`.
@@ -1,6 +1,6 @@
 # SPDX-License-Identifier: GPL-3.0-or-later
 # Copyright (c) 2026 Yaroslav Vasylenko / neuron7xLab
-.PHONY: lab-99 regen lock verify verify-offline build-proof openai-2026
+.PHONY: lab-99 regen lock verify verify-offline build-proof openai-2026 mission-check hostile-review
 
 # Full local lab run — mirrors the CI test + slow-tests + build surface.
 lab-99:
@@ -72,3 +72,23 @@ build-proof:
 # The whole grid, locally.
 openai-2026: lock verify-offline build-proof verify
 	@echo "OpenAI-2026 validation grid complete."
+
+# Mission-critical gate: no silent success, no ambiguous PASS, no stale truth, no unbounded claim.
+mission-check:
+	python -m compileall -q src tests examples research tools
+	python -m pytest -q tests/ -m "not slow"
+	bsff selftest
+	bsff evidence verify
+	python tools/validate_current_truth.py
+	python tools/generate_current_truth.py --check
+	python tools/validate_forbidden_claims.py
+	python tools/validate_statistical_claims.py
+	python tools/validate_truth_contract.py
+	python tools/regenerate.py --check
+
+# Reviewer-facing hostile-review surface.
+hostile-review:
+	@echo "See docs/reviewer_packet/HOSTILE_REVIEW_CHECKLIST.md and docs/ADVERSARIAL_REVIEW.md"
+	bsff evidence verify
+	python tools/validate_statistical_claims.py
+	python tools/validate_forbidden_claims.py
@@ -36,10 +36,17 @@ BSFF aims at a **BCI/EEG signal claim** and tries to refute it under stated atta
 (surrogate nulls, controls, corroboration), emitting a bounded verdict —
 `SURVIVED` / `REFUTED` / `UNSUPPORTED` (see [`docs/VERDICT_SEMANTICS.md`](docs/VERDICT_SEMANTICS.md)).
 
-**Current canonical evidence — `BONN_S2_BRIGHT_LINE_PASSED`**
+**Current canonical evidence — `BONN_S2_BRIGHT_LINE_ROBUSTLY_PASSED`**
 ([`artifacts/release/CURRENT_TRUTH.json`](artifacts/release/CURRENT_TRUTH.json)): on real
-Andrzejak-2001 Bonn EEG the instrument has **power** (ictal SURVIVED 0.96) **and specificity**
-(real-spectrum AR-null FPR 0.02 ≤ 0.05). The earlier S1 negative result is preserved as evidence.
+Andrzejak-2001 Bonn EEG the instrument has robust **power** (ictal SURVIVED 0.94 seed-averaged) and
+**specificity that is robust to both seed and null-model choice**. The pre-registered **S3
+seed-averaged AR-null** confirmatory (N=1000, 10 seeds, frozen before run, **independently re-run and
+reproduced byte-for-byte**) gives FPR 0.028, Wilson 95% CI **[0.019, 0.040]**; and the **multi-null**
+gate holds across AR (0.026), IAAFT (0.032), and phase-randomized (0.034) nulls — every Wilson
+CI-upper ≤ 0.05. This passed only after a falsification flagged, and a larger pre-registered test
+superseded, a smaller-N calibration (0.035, CI-upper 0.056) — robustness was *earned*, not assumed.
+Still not: clinical/regulatory, BNCI executed, or multi-dataset replicated. The S1 negative result is
+preserved as evidence.
 
 ```bash
 git clone https://github.com/neuron7xLab/bsff && cd bsff

@@ -14,7 +14,7 @@ facts (version, live test count, CLI surface, extras) by
 | Field | Value |
 |---|---|
 | Package version | `0.4.0` |
-| Live test count | **525** (collected by `pytest tests/`) |
+| Live test count | **527** (collected by `pytest tests/`) |
 | CLI subcommands | 18 (parsed from `src/bsff/cli.py`) |
 | Optional extras | `dev`, `full`, `fuzz`, `leakage`, `moabb`, `security`, `stats`, `yaml` |
 
@@ -29,7 +29,7 @@ authoritative status:
 
 ## Validation level
 
-Synthetic-ground-truth calibration PLUS a passed external real-data bright-line benchmark (Bonn S2: G1 power + G2 specificity, BONN_S2_BRIGHT_LINE_PASSED). BNCI2014-001 is preregistration-only (not executed). NOT clinical, regulatory, or multi-dataset replicated. Canonical state: artifacts/release/CURRENT_TRUTH.json.
+Synthetic-ground-truth calibration PLUS a Bonn external benchmark that is ROBUSTLY passed: BONN_S2_BRIGHT_LINE_ROBUSTLY_PASSED. Specificity is robust to BOTH seed and null-model choice. Pre-registered S3 seed-averaged AR-null (N=1000, 10 seeds, frozen-before-run, re-run reproduced byte-for-byte): G1 power 0.94, G2 FPR 0.028, Wilson 95% CI [0.019, 0.040]. Multi-null gate (AR/IAAFT/phase-randomized) all Wilson CI-upper <= 0.05 (robust_gate_passed=true). This survived and superseded a smaller-N calibration. BNCI2014-001 preregistration-only (execution not valid for narrowband epochs). NOT clinical, regulatory, BNCI-executed, or multi-dataset replicated. Canonical state: artifacts/release/CURRENT_TRUTH.json.
 
 See [`docs/VALIDATION.md`](docs/VALIDATION.md) for the full evidence tier
 table and [`docs/OPERATING_CHARACTERISTIC.md`](docs/OPERATING_CHARACTERISTIC.md)

@@ -4,7 +4,7 @@
   "package": "bsff",
   "generator": "tools/generate_manifest.py",
   "version": "0.4.0",
-  "test_count": 525,
+  "test_count": 527,
   "release_gates": [
     "truth_contract",
     "architecture_contract",

@@ -0,0 +1,43 @@
+{
+  "schema": "bsff.multi_null_robustness/v1",
+  "verdict": "MULTI_NULL_ROBUST",
+  "all_nulls_pass": true,
+  "gate": "per-null seed-averaged FPR Wilson-95-CI-upper <= 0.05",
+  "n_seeds": 10,
+  "n_segments_per_set": 50,
+  "n_surrogates": 199,
+  "nulls": {
+    "ar": {
+      "fpr": 0.026,
+      "wilson_95ci": [
+        0.0178,
+        0.0378
+      ],
+      "n": 1000,
+      "n_false_positives": 26,
+      "pass": true
+    },
+    "iaaft": {
+      "fpr": 0.032,
+      "wilson_95ci": [
+        0.0228,
+        0.0448
+      ],
+      "n": 1000,
+      "n_false_positives": 32,
+      "pass": true
+    },
+    "phaserand": {
+      "fpr": 0.034,
+      "wilson_95ci": [
+        0.0244,
+        0.0471
+      ],
+      "n": 1000,
+      "n_false_positives": 34,
+      "pass": true
+    }
+  },
+  "timestamp_utc": "2026-06-25T02:20:10Z",
+  "elapsed_sec": 10418.5
+}
@@ -0,0 +1,64 @@
+{
+  "schema": "bsff.s2_falsification/v1",
+  "N_segments": 30,
+  "n_surrogates": 199,
+  "detection_p": 0.025,
+  "attacks": [
+    {
+      "attack": "seed_perturbation",
+      "seed_base": 20260623,
+      "E_survived": 0.967,
+      "ar_null_fpr": 0.0,
+      "E_ok": true,
+      "fpr_ok": true
+    },
+    {
+      "attack": "seed_perturbation",
+      "seed_base": 7,
+      "E_survived": 0.967,
+      "ar_null_fpr": 0.067,
+      "E_ok": true,
+      "fpr_ok": false
+    },
+    {
+      "attack": "seed_perturbation",
+      "seed_base": 999,
+      "E_survived": 0.967,
+      "ar_null_fpr": 0.0,
+      "E_ok": true,
+      "fpr_ok": true
+    },
+    {
+      "attack": "seed_perturbation",
+      "seed_base": 314159,
+      "E_survived": 0.967,
+      "ar_null_fpr": 0.033,
+      "E_ok": true,
+      "fpr_ok": true
+    },
+    {
+      "attack": "ar_order_variation",
+      "ar_order": 5,
+      "ar_null_fpr": 0.0,
+      "fpr_ok": true
+    },
+    {
+      "attack": "ar_order_variation",
+      "ar_order": 10,
+      "ar_null_fpr": 0.033,
+      "fpr_ok": true
+    },
+    {
+      "attack": "ar_order_variation",
+      "ar_order": 15,
+      "ar_null_fpr": 0.0,
+      "fpr_ok": true
+    }
+  ],
+  "claim_survives_attacks": false,
+  "verdict": "S2_FRAGILE_under_attack",
+  "git_commit": "394f5b33547591b4d074e9e1224735ba0947291d",
+  "timestamp_utc": "2026-06-24T17:28:19Z",
+  "interpretation": "G1 power robust (Set E SURVIVED 0.967 across all 4 seeds + AR orders). G2 specificity is a BOUNDARY pass: AR-null FPR 0.0-0.067 across seeds; seed_base=7 gave 0.067>0.05 at N=30. The committed confirmatory (N=100) FPR=0.02 has a Wilson 95% CI reaching ~0.05, so the specificity margin is thin and seed-sensitive. The bright line PASSED the predeclared confirmatory but is NOT robust to seed.",
+  "calibrated_claim": "BONN_S2_BRIGHT_LINE_PASSED is a BOUNDARY/marginal pass (power robust; specificity margin thin, seed-sensitive)."
+}
@@ -0,0 +1,51 @@
+{
+  "schema": "bsff.s2_specificity_calibration/v1",
+  "n_ar_null_tests": 480,
+  "n_false_positives": 17,
+  "pooled_fpr": 0.0354,
+  "wilson_95ci": [
+    0.0222,
+    0.056
+  ],
+  "threshold": 0.05,
+  "seeds": [
+    20260624,
+    7,
+    999,
+    314159,
+    2718,
+    42
+  ],
+  "per_seed": [
+    {
+      "seed": 20260624,
+      "fpr": 0.075
+    },
+    {
+      "seed": 7,
+      "fpr": 0.0625
+    },
+    {
+      "seed": 999,
+      "fpr": 0.0
+    },
+    {
+      "seed": 314159,
+      "fpr": 0.0125
+    },
+    {
+      "seed": 2718,
+      "fpr": 0.025
+    },
+    {
+      "seed": 42,
+      "fpr": 0.0375
+    }
+  ],
+  "fpr_ci_upper_below_threshold": false,
+  "verdict": "S2_SPECIFICITY_NOT_ROBUSTLY_BELOW_0.05",
+  "git_commit": "62ea84bff09d4e6f97c3a44eae12a08f388ea0c2",
+  "timestamp_utc": "2026-06-24T18:35:51Z",
+  "interpretation": "Seed-averaged AR-null FPR = 0.0354 (17/480), Wilson 95% CI [0.022, 0.056]. The CI UPPER bound (0.056) EXCEEDS the 0.05 gate, and 2 of 6 seeds gave FPR > 0.05 (0.075, 0.0625). The predeclared confirmatory FPR=0.02 (seed 20260623, N=100) was a favorable-seed point estimate. G2 specificity is NOT robustly below 0.05.",
+  "calibrated_verdict": "BONN_S2_BRIGHT_LINE not robustly crossed: G1 power robust, but G2 specificity fails robustness (seed-averaged FPR CI crosses the gate). Marginal/favorable-seed pass only."
+}
@@ -0,0 +1,81 @@
+{
+  "schema": "bsff.s3_seed_averaged/v1",
+  "verdict": "S3_BRIGHT_LINE_ROBUSTLY_PASSED",
+  "statistic_id": "sampen_lower_tail_m2_r015_v1",
+  "n_seeds": 10,
+  "n_segments_per_set": 50,
+  "n_surrogates": 199,
+  "G1": {
+    "E_survived_fraction": 0.94,
+    "threshold": 0.8,
+    "pass": true,
+    "n": 500
+  },
+  "G2": {
+    "ar_null_fpr": 0.028,
+    "wilson_95ci": [
+      0.0194,
+      0.0402
+    ],
+    "ci_upper_threshold": 0.05,
+    "pass": true,
+    "n_ar_null": 1000,
+    "n_false_positives": 28
+  },
+  "S3_PASS": true,
+  "per_seed": [
+    {
+      "seed": 20260623,
+      "E_survived": 0.94,
+      "ar_null_fpr": 0.01
+    },
+    {
+      "seed": 7,
+      "E_survived": 0.94,
+      "ar_null_fpr": 0.05
+    },
+    {
+      "seed": 999,
+      "E_survived": 0.94,
+      "ar_null_fpr": 0.0
+    },
+    {
+      "seed": 314159,
+      "E_survived": 0.94,
+      "ar_null_fpr": 0.01
+    },
+    {
+      "seed": 2718,
+      "E_survived": 0.94,
+      "ar_null_fpr": 0.02
+    },
+    {
+      "seed": 42,
+      "E_survived": 0.94,
+      "ar_null_fpr": 0.04
+    },
+    {
+      "seed": 161803,
+      "E_survived": 0.94,
+      "ar_null_fpr": 0.04
+    },
+    {
+      "seed": 27182,
+      "E_survived": 0.94,
+      "ar_null_fpr": 0.04
+    },
+    {
+      "seed": 31337,
+      "E_survived": 0.94,
+      "ar_null_fpr": 0.04
+    },
+    {
+      "seed": 123456,
+      "E_survived": 0.94,
+      "ar_null_fpr": 0.03
+    }
+  ],
+  "gate": "G1 seed-avg SURVIVED>=0.80 AND G2 AR-null FPR Wilson-95-CI-upper<=0.05",
+  "timestamp_utc": "2026-06-24T23:06:40Z",
+  "elapsed_sec": 7109.9
+}