This repository studies market-neutral crypto statistical arbitrage with signed-graph clustering and walk-forward backtesting. It builds a residualized correlation graph after removing the market mode, clusters the graph with signed methods such as SPONGE and BNC, and trades cluster-level mean-reversion signals under explicit turnover and transaction-cost controls.
stat_arb/: main research package for data loading, graph construction, clustering, signals, backtests, and reportingdata/: processed market, volume, ETH, and correlation datasets used by the backtestspics/: diagnostic figures for clustering quality and exploratory analysiscrypto_project.ipynb: exploratory notebook used during early researcharchived_research/: older exploratory artifacts retained for referenceCrypto_Project_Report_Pre_Backtest.pdf: written report from the earlier research stage
python -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install numpy pandas scipy scikit-learn matplotlib statsmodelsRun the baseline SPONGE backtest:
python stat_arb/run_phase1.pyRun the clustering-method sweep:
python stat_arb/run_phase2.pyIf you want to rerun the notebook cells that call CoinMarketCap, export your credential first:
export CMC_API_KEY=your_coinmarketcap_keyThe pipeline first aligns token prices, volumes, and ETH reference data, then builds a tradable universe subject to history and liquidity filters. Returns are residualized against the market mode with PCA, transformed into a signed k-nearest-neighbor correlation graph, and clustered with SPONGE, BNC, or signed spectral methods. Signals are generated from within-cluster mean reversion, normalized to target leverage, and evaluated in a walk-forward backtest with lagging, turnover controls, and transaction-cost assumptions to limit lookahead and overstatement.
Primary outputs are written under stat_arb/reporting/ and include fold-level returns, turnover series, clustering sweep summaries, leaderboards, and the final report. The intended use is comparative research across clustering methods rather than a production-ready live trading engine.
- Results are sensitive to crypto data quality, survivorship, and execution assumptions
- The checked-in notebook and archived artifacts reflect exploratory work and are less polished than the package backtest path
- Transaction costs and liquidity in crypto can change quickly enough to invalidate static assumptions
This project is distributed under the MIT License