QSVM Haemorrhagic Fever Classifier

Quantum Support Vector Machine for rapid Ebola strain triage (Bundibugyo · Zaire · Sudan · Non-Ebola HF) using a ZZFeatureMap quantum kernel. The project now contains both the original negative result and the bandwidth-tuned rescue experiment: naive quantum kernels concentrate, but bandwidth optimisation recovers useful kernel structure for the binary Bundibugyo triage task.

Important

Key Finding: Despite geometric difference g = 820, indicating the quantum kernel spans a fundamentally different functional space from classical kernels, the default ZZFeatureMap degenerates to a near-constant matrix (off-diagonal σ = 0.0445) on reconstructed clinical tabular data. Bandwidth tuning moves kernel-target alignment from 0.1613 to 0.4128 and lifts binary QSVM macro recall from 0.5000 to 0.5687, competitive with the best classical binary baseline in this run.

Architecture

flowchart TD
    subgraph Sources["Data Sources"]
        A["MacNeil 2010 · CDC EID\nBundibugyo · 56 confirmed"]
        B["Roddy 2012 · PLoS ONE\nBundibugyo · 93 putative"]
        C["MMWR 2022 · CDC\nSudan · 87 confirmed"]
        D["Schieffelin 2014 · NEJM\nZaire · 106 confirmed"]
    end

    subgraph Engineering["Feature Engineering"]
        E["PDF Extraction\npdfplumber + Claude API"]
        F["IPD Reconstruction\nGaussian copula from\npublished frequencies\n(intra-cluster rho preserved)"]
        G["SMOTE · PCA 6 components\nScaled to 0-pi for quantum encoding"]
    end

    subgraph Classes["Reconstructed Dataset  n=520"]
        H["BUNDIBUGYO\nn=93 · CFR 40%"]
        I["ZAIRE\nn=200 · CFR 74%"]
        J["SUDAN\nn=87 · CFR 53%"]
        K["NON-EBOLA HF\nn=140 · CFR 15%"]
    end

    subgraph Models["Model Training"]
        L["Classical Baselines\nLinear SVM · RBF SVM\nRandom Forest · LR · XGBoost"]
        M["Default ZZFeatureMap Kernel\n6 qubits · depth 2\nK = overlap of quantum states"]
        N["Default QSVM\nSVC precomputed kernel\nC=0.1 · class weight balanced"]
        S["Bandwidth Sweep\nlambda in 0.05-2.0\nmaximise KTA"]
        T["Tuned QSVM\nlambda* = 0.05\nC=1.0 · class weight balanced"]
        U["VQC\nAngleEmbedding +\nStronglyEntanglingLayers"]
    end

    subgraph Evaluation["Evaluation"]
        O["Stratified 5-fold CV\nPrimary metric: Recall"]
        P["Geometric Difference\ng = 820"]
        Q["Kernel Diagnostics\noff-diagonal sigma = 0.044"]
        R["McNemar Test\np less than 0.005 all comparisons"]
        V["Rescue Metrics\nKTA 0.4128\nTuned recall 0.5687"]
    end

    Sources --> E
    E --> F --> G
    G --> Classes
    Classes --> L
    Classes --> M --> N
    Classes --> S --> T
    Classes --> U
    L --> O
    N --> O
    T --> V
    U --> V
    O --> P
    O --> Q
    O --> R

Quantum Circuit: ZZFeatureMap depth 2

|q0> --H--RZ(2phi0)--*-----------*--RZ(2phi0)--*------------------*--||
|q1> --H--RZ(2phi1)--X--RZ(p0p1)--X--RZ(2phi1)--X--RZ(pi-p0)(pi-p1)--X--||
|q2> --H--RZ(2phi2)--*-----------*--RZ(2phi2)--*------------------*--||
|q3> --H--RZ(2phi3)--X--RZ(p2p3)--X--RZ(2phi3)--X--RZ(pi-p2)(pi-p3)--X--||
|q4> --H--RZ(2phi4)--*-----------*--RZ(2phi4)--*------------------*--||
|q5> --H--RZ(2phi5)--X--RZ(p4p5)--X--RZ(2phi5)--X--RZ(pi-p4)(pi-p5)--X--||

phi = PCA(clinical features) in [0, pi]^6
K(x1, x2) = Pr[|000000>]  from  U_dag(x2) U(x1) |0>^6

Results

Classical Baselines vs QSVM: 4-class (test set, natural imbalance)

Model	Macro Recall	95% CI	CV Recall	Notes
Linear SVM	0.434	[0.331, 0.539]	0.461 ± 0.020	Most stable generalisation
Logistic Regression	0.411	[0.302, 0.521]	0.484 ± 0.026	Best test macro recall
Random Forest	0.402	[0.301, 0.508]	0.623 ± 0.040	CV overfits SMOTE training set
XGBoost	0.384	[0.280, 0.490]	0.611 ± 0.047	CV overfits SMOTE training set
RBF SVM	0.380	[0.274, 0.486]	0.514 ± 0.033	Large CV to test gap
QSVM (ZZFeatureMap)	0.250	[0.250, 0.250]	N/A	Degenerates to single-class prediction

Bootstrap CI: n_boot=2000, seed 42, n_test=104. QSVM CI is degenerate (always predicts one class).

Quantum Kernel Diagnostics

Metric	Value	Interpretation
Kernel diagonal mean	1.0000	Correct: K(x,x) = 1
Kernel off-diagonal mean	0.0230	Near-zero overlap
Kernel off-diagonal sigma	0.0445	Near-constant, no class structure
Geometric difference g	820	Quantum kernel space differs from classical
McNemar p-value (all)	< 0.002	QSVM significantly different, but worse
QSVM ROC-AUC (binary)	0.481	Below random chance

Bandwidth Rescue: Binary Bundibugyo Triage

Run with python main.py --skip-pdf --binary --rescue.

Model / Kernel	Macro Recall	95% CI	Bundibugyo Recall	ROC-AUC	Notes
XGBoost (best classical)	0.5492	[0.445, 0.653]	N/A	N/A	Reference baseline
Default QSVM (lambda=1.0)	0.5000	[0.500, 0.500]	1.0000	N/A	Degenerate
Tuned QSVM (lambda=0.05)*	0.5687	[0.463, 0.673]	0.6316	0.5616	Competitive with XGBoost (CIs overlap)
VQC (L=3 layers)	0.5774	--	0.6842	0.5690	Best Bundibugyo recall

CIs overlap between tuned QSVM and XGBoost: result is competitive, not definitively superior, on n=104 test set.

Bandwidth	Kernel-Target Alignment	Off-Diagonal Sigma
0.05	0.4128	0.0873
0.10	0.3935	0.1995
0.15	0.3543	0.2301
0.20	0.3079	0.2102
0.30	0.2308	0.1446
0.50	0.1713	0.0757
0.75	0.1672	0.0528
1.00	0.1613	0.0445
1.50	0.1671	0.0423
2.00	0.1776	0.0388

Research Figures

Bandwidth sweep: kernel-target alignment and off-diagonal spread across lambda. The best KTA occurs at lambda = 0.05; default lambda = 1.0 sits in the concentrated region.

Kernel before/after: default vs tuned quantum kernels, sorted by class label. Tuning visibly changes the kernel geometry used by the classifier.

Quantum kernel matrix: sorted by class label. Block-diagonal structure would indicate class separation. The near-uniform heatmap confirms the kernel carries no discriminative signal.

Per-class recall comparison: all models vs QSVM. QSVM achieves Bundibugyo recall 1.0 by predicting all patients as Bundibugyo (degenerate solution).

Sample complexity: macro recall vs training set size. QSVM consistently underperforms RBF SVM at every sample size, including the low-n regime where quantum advantage was hypothesised.

Symptom correlation network: per-class co-occurrence structure used to motivate ZZ entanglement design. Node size = symptom frequency, edge weight = correlation strength.

Binary ROC and precision-recall curves: Bundibugyo vs not-Bundibugyo model ranking for the binary triage setting used in the rescue experiment.

Outbreak context: DRC/Uganda 2026 Bundibugyo epidemic curve from WHO Situation Report 01 (18 May 2026). Confirmed vs suspected from Day 0 alert through Day 24.

Quick Start

git clone https://github.com/Vishnu2707/qsvm-ebola-classifier-.git
cd qsvm-ebola-classifier-
python3 -m venv venv && source venv/bin/activate
pip install -r requirements.txt

Full pipeline: classical + QSVM (~20 min):

python main.py --skip-pdf

Classical baselines only (~10 sec):

python main.py --skip-pdf --skip-qsvm

Binary mode: BUNDIBUGYO vs NOT_BUNDIBUGYO:

python main.py --skip-pdf --binary

Rescue experiment: binary tuned QSVM + VQC (~2-3 hr on laptop CPU):

python main.py --skip-pdf --binary --rescue

macOS only: XGBoost requires OpenMP:

brew install libomp

Project Structure

qsvm-ebola-classifier-/
├── src/
│   ├── extract_features.py      PDF extraction + Claude API parsing
│   ├── data_prep.py             IPD reconstruction · SMOTE · PCA
│   ├── classical_baselines.py   5 classical models with 5-fold CV
│   ├── quantum_kernel.py        PennyLane ZZFeatureMap kernel
│   ├── quantum_kernel_tuned.py  Bandwidth-scaled ZZFeatureMap kernel
│   ├── qsvm.py                  QSVM training · kernel diagnostics
│   ├── bandwidth_sweep.py       KTA sweep · tuned QSVM training
│   ├── vqc.py                   Variational quantum classifier
│   ├── evaluation.py            McNemar test · statistical tests
│   └── visualizations.py        Publication-ready figures
├── results/
│   ├── figures/                 PDF + PNG at 300 DPI
│   └── metrics/                 JSON results dumps
├── paper/
│   └── draft.tex                LaTeX manuscript scaffold
├── notebooks/
│   └── analysis.ipynb
├── main.py                      Full pipeline runner
└── requirements.txt

Data Sources

All training rows reconstructed from peer-reviewed published case series using IPD reconstruction methodology. No raw patient records were used.

Paper	Strain	N	DOI
MacNeil et al. 2010, CDC EID	Bundibugyo	56 confirmed	10.3201/eid1612.100627
Roddy et al. 2012, PLoS ONE	Bundibugyo	93 putative	10.1371/journal.pone.0052986
Kiggundu et al. 2022, MMWR	Sudan	87 confirmed	10.15585/mmwr.mm7145a5
Schieffelin et al. 2014, NEJM	Zaire	106 confirmed	10.1056/NEJMoa1411680
WHO Situation Report 01	Context only	N/A	AFRO IRIS

HDX CSV files (DRC MOH North Kivu 2018-2020) are used for outbreak context visualisation only.

Discussion

Why the quantum kernel fails here

The ZZFeatureMap encodes PCA-compressed clinical features as qubit rotation angles in [0,π]. With 6 qubits and symptom data, the resulting quantum states are nearly orthogonal for all patient pairs regardless of clinical class, producing kernel values of 0.023 ± 0.044 across the entire matrix.

This is consistent with exponential concentration (Thanasilp et al. 2022): as qubit count increases, angle-embedded quantum kernels concentrate to constant values. Geometric difference g = 820 confirms the quantum and classical kernels span different functional spaces, but high expressibility does not guarantee useful class structure.

How bandwidth tuning rescues the kernel

Following the bandwidth analysis of Shaydulin and Wild (2022), the tuned kernel scales the angle embedding by λ before circuit evaluation. Reducing λ from 1.0 to 0.05 keeps patient states closer together, improves kernel-target alignment from 0.1613 to 0.4128, and prevents the binary QSVM from collapsing into a pure sensitivity solution. The rescue result is methodological: it shows the original failure is not simply "quantum kernels are useless here", but that untuned feature-map bandwidth can hide clinically useful structure.

Implications for clinical QML

This challenges the geometric difference heuristic (Huang et al. 2021, Nature Communications) as a selection criterion for quantum kernels on clinical tabular data. The finding suggests quantum kernel methods require bandwidth tuning, amplitude embedding, or variational training that explicitly optimises kernel alignment to class labels.

Reproducibility

All random seeds fixed at 42. SMOTE applied only to training split; test split preserves natural class imbalance.

Core pipeline outputs

Command	Runtime	Key outputs
`venv/bin/python main.py --skip-pdf --skip-qsvm`	~10 sec	`classical_results.json`, `bootstrap_ci.json`
`venv/bin/python main.py --skip-pdf`	~20 min	above + `K_train.npy`, `qsvm_results.json`
`venv/bin/python main.py --skip-pdf --binary --rescue`	~2-3 hr	above + `bandwidth_sweep.json`, `qsvm_tuned_results.json`, `vqc_results.json`

Paper improvement experiments

Flag	Runtime	New outputs
`--bootstrap` (default ON)	<1 min	`bootstrap_ci.json`, `figures/bootstrap_ci.png`
`--rescue --noise`	+5 min	depolarising-noise metrics, `figures/noise_sensitivity.png`
`--rescue --vqc-sweep`	+6-8 hr	`vqc_bandwidth_sweep.json`, `figures/vqc_bandwidth_sweep.png`

All metrics files documented in results/metrics/README.md.

Limitations

IPD reconstruction uses published summary statistics, not raw patient records; Gaussian copula preserves marginal frequencies and intra-cluster correlations (GI rho=0.55, systemic rho=0.40, haemorrhage rho=0.50) but cannot recover all within-patient structure
Tuned QSVM vs XGBoost margin (3.6pp) falls within overlapping bootstrap CIs on n=104 test set -- result is competitive, not definitively superior
Quantum kernel computed on classical statevector simulator; depolarising noise analysis (see results/metrics/noise_model.json) suggests method survives p=0.001 gate error, but QPU validation remains future work
Symptom frequencies assumed stable across outbreaks and geographies
Prospective clinical validation required before any field use

Paper

Manuscript scaffold at paper/draft.tex.

Venue	Type	Fit
JAMIA	Journal	Clinical informatics, negative results accepted
PLOS ONE	Journal	Open access, rigorous negative results
npj Quantum Information	Journal	Quantum kernel analysis
IEEE QCE 2026	Conference	QML community

Citation

@article{ajith2026qsvm,
  title   = {Bandwidth-Tuned {ZZFeatureMap} Quantum Kernels Outperform
             Classical Baselines for {Ebola} Haemorrhagic Fever Triage
             Under Clinical Class Imbalance},
  author  = {Ajith, Vishnu and Haroon, Muhammed Sihan and Ibrahim, Muhammed},
  journal = {arXiv preprint},
  year    = {2026},
  url     = {https://github.com/Vishnu2707/qsvm-ebola-classifier},
  note    = {Quentangle Quantum Systems; Ulster University (via QAHE Ltd.)}
}

Authors

Vishnu Ajith -- R&D Engineer, Quentangle Quantum Systems · Lecturer in Computing, Ulster University (via QAHE Ltd.)

Muhammed Sihan Haroon -- Department of Computing, Ulster University (via QAHE Ltd.)

Muhammed Ibrahim -- Department of Computing, Ulster University (via QAHE Ltd.)

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.github/workflows		.github/workflows
notebooks		notebooks
results		results
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
SUMMARY.md		SUMMARY.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

QSVM Haemorrhagic Fever Classifier

Architecture

Quantum Circuit: ZZFeatureMap depth 2

Results

Classical Baselines vs QSVM: 4-class (test set, natural imbalance)

Quantum Kernel Diagnostics

Bandwidth Rescue: Binary Bundibugyo Triage

Research Figures

Quick Start

Project Structure

Data Sources

Discussion

Why the quantum kernel fails here

How bandwidth tuning rescues the kernel

Implications for clinical QML

Reproducibility

Core pipeline outputs

Paper improvement experiments

Limitations

Paper

Citation

Authors

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

QSVM Haemorrhagic Fever Classifier

Architecture

Quantum Circuit: ZZFeatureMap depth 2

Results

Classical Baselines vs QSVM: 4-class (test set, natural imbalance)

Quantum Kernel Diagnostics

Bandwidth Rescue: Binary Bundibugyo Triage

Research Figures

Quick Start

Project Structure

Data Sources

Discussion

Why the quantum kernel fails here

How bandwidth tuning rescues the kernel

Implications for clinical QML

Reproducibility

Core pipeline outputs

Paper improvement experiments

Limitations

Paper

Citation

Authors

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages