EEG-Based Comprehension Detection: A Multi-Dataset Investigation

An honest non-replication of CCI and 17 alternative metrics across five independent EEG datasets

Theodor Spiro | ORCID 0009-0004-5382-9346 | tspiro@vaika.org

📄 Full report: REPORT_FINAL.md 🧮 Main analysis scripts: src/metric_search.py, src/wpli_deep.py, src/zuco_wpli_replication.py 🗂 Status: Complete — negative replication result

Brief Summary

We tested whether EEG connectivity and spectral metrics can serve as universal biomarkers of reading and listening comprehension, starting from a promising single-dataset finding (Connectivity Contrast Index, CCI; window-level d = +0.564 on a small distance-learning dataset). Across 18 candidate metrics × 5 independent datasets × 126 subjects total, the analysis demonstrates that:

No metric replicates across datasets. Of 18 metrics tested on DERCo and STEW, 7 reach p < 0.05 on DERCo and 0 on STEW. Cross-dataset hits: 0.
The strongest within-subject signal reverses direction across paradigms. wPLI alpha CV on DERCo (within-subject p = 0.015, ρ = +0.234) reverses sign on ZuCo (ρ = −0.466) and shows null effects on STEW and Speech-in-Noise.
CCI was a pseudoreplication artifact. Window-level effect (d = +0.564, p = 7.1×10⁻⁷) collapses at session level (permutation p = 0.64, N = 11) and does not replicate on DERCo (ρ ≈ 0 across three replication strategies).
One real but paradigm-specific effect: central-cortex alpha wPLI tracks reading state within individuals on DERCo (FC/C region ρ = −0.343, p = 6×10⁻⁴; bootstrap CI excludes 0). This is a legitimate neural effect, but it does not generalize to listening comprehension or workload paradigms.
Aperiodic slope (1/f) is a between-subject trait correlate (DERCo: ρ = −0.259, p = 0.008) but does not track within-subject state changes and does not replicate on STEW.

Bottom line: A universal, zero-calibration EEG comprehension detector does not exist with current metrics. Practical systems would require per-paradigm training, per-subject calibration, and multimodal fusion — and would target attention/engagement proxies rather than comprehension itself. Commercial claims to the contrary are not supported by systematic cross-dataset evaluation.

Datasets

Dataset	Source	N	Channels	Sampling	Task
D1 Distance Learning	Kaggle	8 (11 sessions)	14	8 Hz	Online lecture viewing
DERCo	OSF	21	32	1000 Hz	Naturalistic reading, 5 articles
STEW	Kaggle	45	14	128 Hz	Cognitive workload (SIMKAP)
ZuCo 2.0	OSF	6 (of 18)	128	500 Hz	Natural sentence reading
Speech-in-Noise (Etard & Reichenbach)	Zenodo	6 (of 20)	63	1000 Hz	English audiobooks under noise

Two additional datasets (ERP CORE N400, ROAMM) were inspected but excluded from the cross-dataset analysis: insufficient N and incomplete public release respectively.

Metrics tested

18 metrics across four families:

Connectivity — CCI, mean correlation, wPLI alpha (mean / CV / regional / multi-band)
Spectral — aperiodic slope (1/f), spectral entropy, band powers, theta/alpha & theta/beta ratios, peak alpha frequency
Complexity — permutation entropy, sample entropy, Lempel-Ziv, Hurst exponent
ERP — N400 slope, N400 cloze difference
Dimensionality — participation ratio, effective rank, PC1 variance

Full table and rationale in REPORT_FINAL.md §3.

Repository structure

├── REPORT_FINAL.md           # Canonical writeup (354 lines, all results)
├── REPORT.md                 # Earlier writeup (kept for provenance)
├── src/                      # Analysis scripts (per-dataset, per-metric)
├── notebooks/                # Exploratory notebooks
├── data/                     # Raw and preprocessed EEG (gitignored — see data/README.md)
├── results/                  # Figures and CSVs (output of src/ scripts)
└── requirements.txt

Reproducing the analysis

git clone https://github.com/mool32/eeg-connectivity-contrast.git
cd eeg-connectivity-contrast
pip install -r requirements.txt

# Datasets must be downloaded separately (see data/README.md for links and licenses).
# Each dataset has a dedicated entry-point script in src/:

python src/derco_cci_replication.py        # DERCo: CCI replication
python src/metric_search.py                # 18 metrics × DERCo + STEW
python src/wpli_deep.py                    # wPLI band/topography/bootstrap
python src/zuco_wpli_replication.py        # ZuCo cross-paradigm test
python src/aperiodic_replication.py        # 1/f slope across DERCo, STEW, ERP CORE

All numbers reported in REPORT_FINAL.md reproduce from the scripts above.

Citation

If you find this report useful — particularly as a reference for negative-result replication practice in EEG comprehension research — please cite:

Spiro, T. (2026). EEG-based comprehension detection: a multi-dataset non-replication of CCI and 17 alternative metrics. Technical report. https://github.com/mool32/eeg-connectivity-contrast

Contact

Theodor Spiro — tspiro@vaika.org

License

MIT (see LICENSE)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EEG-Based Comprehension Detection: A Multi-Dataset Investigation

Brief Summary

Datasets

Metrics tested

Repository structure

Reproducing the analysis

Citation

Contact

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
notebooks		notebooks
results		results
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
REPORT.md		REPORT.md
REPORT_FINAL.md		REPORT_FINAL.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

EEG-Based Comprehension Detection: A Multi-Dataset Investigation

Brief Summary

Datasets

Metrics tested

Repository structure

Reproducing the analysis

Citation

Contact

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages