An honest non-replication of CCI and 17 alternative metrics across five independent EEG datasets
Theodor Spiro | ORCID 0009-0004-5382-9346 | tspiro@vaika.org
📄 Full report: REPORT_FINAL.md
🧮 Main analysis scripts: src/metric_search.py, src/wpli_deep.py, src/zuco_wpli_replication.py
🗂 Status: Complete — negative replication result
We tested whether EEG connectivity and spectral metrics can serve as universal biomarkers of reading and listening comprehension, starting from a promising single-dataset finding (Connectivity Contrast Index, CCI; window-level d = +0.564 on a small distance-learning dataset). Across 18 candidate metrics × 5 independent datasets × 126 subjects total, the analysis demonstrates that:
- No metric replicates across datasets. Of 18 metrics tested on DERCo and STEW, 7 reach p < 0.05 on DERCo and 0 on STEW. Cross-dataset hits: 0.
- The strongest within-subject signal reverses direction across paradigms. wPLI alpha CV on DERCo (within-subject p = 0.015, ρ = +0.234) reverses sign on ZuCo (ρ = −0.466) and shows null effects on STEW and Speech-in-Noise.
- CCI was a pseudoreplication artifact. Window-level effect (d = +0.564, p = 7.1×10⁻⁷) collapses at session level (permutation p = 0.64, N = 11) and does not replicate on DERCo (ρ ≈ 0 across three replication strategies).
- One real but paradigm-specific effect: central-cortex alpha wPLI tracks reading state within individuals on DERCo (FC/C region ρ = −0.343, p = 6×10⁻⁴; bootstrap CI excludes 0). This is a legitimate neural effect, but it does not generalize to listening comprehension or workload paradigms.
- Aperiodic slope (1/f) is a between-subject trait correlate (DERCo: ρ = −0.259, p = 0.008) but does not track within-subject state changes and does not replicate on STEW.
Bottom line: A universal, zero-calibration EEG comprehension detector does not exist with current metrics. Practical systems would require per-paradigm training, per-subject calibration, and multimodal fusion — and would target attention/engagement proxies rather than comprehension itself. Commercial claims to the contrary are not supported by systematic cross-dataset evaluation.
| Dataset | Source | N | Channels | Sampling | Task |
|---|---|---|---|---|---|
| D1 Distance Learning | Kaggle | 8 (11 sessions) | 14 | 8 Hz | Online lecture viewing |
| DERCo | OSF | 21 | 32 | 1000 Hz | Naturalistic reading, 5 articles |
| STEW | Kaggle | 45 | 14 | 128 Hz | Cognitive workload (SIMKAP) |
| ZuCo 2.0 | OSF | 6 (of 18) | 128 | 500 Hz | Natural sentence reading |
| Speech-in-Noise (Etard & Reichenbach) | Zenodo | 6 (of 20) | 63 | 1000 Hz | English audiobooks under noise |
Two additional datasets (ERP CORE N400, ROAMM) were inspected but excluded from the cross-dataset analysis: insufficient N and incomplete public release respectively.
18 metrics across four families:
- Connectivity — CCI, mean correlation, wPLI alpha (mean / CV / regional / multi-band)
- Spectral — aperiodic slope (1/f), spectral entropy, band powers, theta/alpha & theta/beta ratios, peak alpha frequency
- Complexity — permutation entropy, sample entropy, Lempel-Ziv, Hurst exponent
- ERP — N400 slope, N400 cloze difference
- Dimensionality — participation ratio, effective rank, PC1 variance
Full table and rationale in REPORT_FINAL.md §3.
├── REPORT_FINAL.md # Canonical writeup (354 lines, all results)
├── REPORT.md # Earlier writeup (kept for provenance)
├── src/ # Analysis scripts (per-dataset, per-metric)
├── notebooks/ # Exploratory notebooks
├── data/ # Raw and preprocessed EEG (gitignored — see data/README.md)
├── results/ # Figures and CSVs (output of src/ scripts)
└── requirements.txt
git clone https://github.com/mool32/eeg-connectivity-contrast.git
cd eeg-connectivity-contrast
pip install -r requirements.txt
# Datasets must be downloaded separately (see data/README.md for links and licenses).
# Each dataset has a dedicated entry-point script in src/:
python src/derco_cci_replication.py # DERCo: CCI replication
python src/metric_search.py # 18 metrics × DERCo + STEW
python src/wpli_deep.py # wPLI band/topography/bootstrap
python src/zuco_wpli_replication.py # ZuCo cross-paradigm test
python src/aperiodic_replication.py # 1/f slope across DERCo, STEW, ERP COREAll numbers reported in REPORT_FINAL.md reproduce from the scripts above.
If you find this report useful — particularly as a reference for negative-result replication practice in EEG comprehension research — please cite:
Spiro, T. (2026). EEG-based comprehension detection: a multi-dataset non-replication of CCI and 17 alternative metrics. Technical report. https://github.com/mool32/eeg-connectivity-contrast
Theodor Spiro — tspiro@vaika.org
MIT (see LICENSE)