Crypto Stat-Arb

This repository studies market-neutral crypto statistical arbitrage with signed-graph clustering and walk-forward backtesting. It builds a residualized correlation graph after removing the market mode, clusters the graph with signed methods such as SPONGE and BNC, and trades cluster-level mean-reversion signals under explicit turnover and transaction-cost controls.

Repository layout

stat_arb/: main research package for data loading, graph construction, clustering, signals, backtests, and reporting
data/: processed market, volume, ETH, and correlation datasets used by the backtests
pics/: diagnostic figures for clustering quality and exploratory analysis
crypto_project.ipynb: exploratory notebook used during early research
archived_research/: older exploratory artifacts retained for reference
Crypto_Project_Report_Pre_Backtest.pdf: written report from the earlier research stage

Setup

python -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install numpy pandas scipy scikit-learn matplotlib statsmodels

Run the baseline SPONGE backtest:

python stat_arb/run_phase1.py

Run the clustering-method sweep:

python stat_arb/run_phase2.py

If you want to rerun the notebook cells that call CoinMarketCap, export your credential first:

export CMC_API_KEY=your_coinmarketcap_key

Methodology

The pipeline first aligns token prices, volumes, and ETH reference data, then builds a tradable universe subject to history and liquidity filters. Returns are residualized against the market mode with PCA, transformed into a signed k-nearest-neighbor correlation graph, and clustered with SPONGE, BNC, or signed spectral methods. Signals are generated from within-cluster mean reversion, normalized to target leverage, and evaluated in a walk-forward backtest with lagging, turnover controls, and transaction-cost assumptions to limit lookahead and overstatement.

Results

Primary outputs are written under stat_arb/reporting/ and include fold-level returns, turnover series, clustering sweep summaries, leaderboards, and the final report. The intended use is comparative research across clustering methods rather than a production-ready live trading engine.

Known limits

Results are sensitive to crypto data quality, survivorship, and execution assumptions
The checked-in notebook and archived artifacts reflect exploratory work and are less polished than the package backtest path
Transaction costs and liquidity in crypto can change quickly enough to invalidate static assumptions

License

This project is distributed under the MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
archived_research		archived_research
data		data
pics		pics
stat_arb		stat_arb
.env.example		.env.example
.gitignore		.gitignore
Crypto_Project_Report_Pre_Backtest.pdf		Crypto_Project_Report_Pre_Backtest.pdf
LICENSE		LICENSE
README.md		README.md
crypto_project.ipynb		crypto_project.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Crypto Stat-Arb

Repository layout

Setup

Methodology

Results

Known limits

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Crypto Stat-Arb

Repository layout

Setup

Methodology

Results

Known limits

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages