Skip to content

quantsingularity/Trading-Bots

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Trading Bots

A collection of three self-contained Jupyter notebooks, each demonstrating a different machine learning paradigm applied to algorithmic trading. The notebooks cover reinforcement learning, supervised learning, and unsupervised learning across three distinct asset classes: forex, gold futures, and crude oil futures.

All market data is fetched directly from Yahoo Finance via yfinance, so no external data files are required. Each notebook runs independently from installation to backtest.


Repository Structure

Trading-Bots/
├── Forex_Trading.ipynb       # RL bot: DQN agent on EUR/USD
├── Gold_Trading.ipynb        # Supervised bot: RF + XGBoost + LSTM ensemble on GC=F
├── Crude_Oil_Trading.ipynb   # Unsupervised bot: regime detection on CL=F
└── LICENSE

Notebooks

1. Forex Trading (Reinforcement Learning)

File: Forex_Trading.ipynb
Instrument: EUR/USD (EURUSD=X)
Data range: 2021-01-01 to 2024-12-31 (daily bars)

A Deep Q-Network (DQN) agent trained to trade EUR/USD spot forex. The agent interacts with a custom Gymnasium environment that simulates realistic trading conditions including a 2-pip spread and ruin termination if the account loses more than 50% of its initial balance.

Pipeline

  1. Download EUR/USD OHLCV from yfinance
  2. Engineer technical indicators as state features
  3. Fit a MinMaxScaler on training data only (no leakage into test)
  4. Build a custom ForexTradingEnv Gymnasium environment
  5. Train a DQN agent with a 3-layer MLP policy
  6. Evaluate on the held-out test set and compare against buy-and-hold

Action space

Action Meaning
0 Hold (maintain current position)
1 Buy (open long / close short)
2 Sell (open short / close long)

State space

A window of the last 20 scaled feature vectors concatenated with the current normalised position and cumulative PnL.

Technical features

Feature Description
Returns 1-period price return
HL_Ratio (High - Low) / Close
OC_Ratio (Close - Open) / Open
SMA_5, SMA_20, SMA_50 Simple moving averages
BB_Upper, BB_Lower, BB_Width Bollinger Bands (20-period)
RSI RSI (14-period)
MACD, MACD_Signal MACD and signal line
ATR Average True Range
OBV On-Balance Volume

DQN configuration

Parameter Value
Policy MlpPolicy [256, 256, 128]
Learning rate 1e-4
Replay buffer 50,000
Batch size 64
Gamma 0.99
Tau (soft update) 0.005
Train frequency every 4 steps
Exploration fraction 0.30
Epsilon final 0.05
Total timesteps 100,000
Train/test split 80% / 20%

Evaluation metrics

Total return, annualised return, Sharpe ratio, Sortino ratio, Calmar ratio, max drawdown, total trade count, win rate, and action distribution analysis.

Visualisations

  • EUR/USD price with indicators and train/test boundary
  • DQN episode reward curve (raw and smoothed)
  • Portfolio value vs buy-and-hold on test set
  • Drawdown and trade markers over time
  • Action distribution bar chart and pie chart

Dependencies

numpy pandas matplotlib yfinance stable-baselines3 gymnasium scikit-learn torch

2. Gold Trading (Supervised Learning)

File: Gold_Trading.ipynb
Instrument: Gold Futures (GC=F, COMEX)
Data range: 2015-01-01 to 2024-12-31 (daily bars)

A binary classification pipeline that predicts whether gold will rise more than 0.5% over the next five trading days. Three classifiers are trained independently and combined into a soft-voting ensemble.

Pipeline

  1. Download GC=F OHLCV from yfinance
  2. Engineer 20+ technical features
  3. Label each day as Buy (1) if the 5-day forward return exceeds 0.5%, else Hold/Sell (0)
  4. Time-based 80/20 train/test split with StandardScaler fitted on train only
  5. Train Random Forest, XGBoost, and LSTM classifiers
  6. Combine into a weighted soft-voting ensemble (RF 30%, XGBoost 40%, LSTM 30%)
  7. Backtest the ensemble signal against buy-and-hold

Label construction

Forward_Return = pct_change(5).shift(-5)
Label = 1 if Forward_Return > 0.005 else 0

Technical features

Category Features
Multi-horizon returns Return_1d, Return_2d, Return_3d, Return_5d, Return_10d
Volatility Log_Return, Volatility_5, Volatility_20
Moving averages SMA_5, SMA_10, SMA_20, SMA_50, SMA_200, EMA_12, EMA_26
Momentum RSI (14), MACD, MACD_Signal, MACD_Hist
Bands BB_Upper, BB_Lower, BB_Width
Range ATR (14), HL_Ratio, OC_Ratio
Volume OBV, Volume_SMA_20, Volume_Ratio, MFI, Stoch_K, Stoch_D

Model configurations

Random Forest:

  • 500 estimators, max depth 8, min samples leaf 20
  • max_features='sqrt', class_weight='balanced'

XGBoost:

  • 600 estimators, max depth 5, learning rate 0.03
  • subsample=0.8, colsample_bytree=0.8
  • scale_pos_weight set from class ratio to handle imbalance

LSTM:

  • 2 LSTM layers, hidden size 128, dropout 0.3
  • Sequence length: 20 timesteps
  • Head: Linear(128, 64) + ReLU + Dropout(0.2) + Linear(64, 1) + Sigmoid
  • Optimiser: Adam (lr=1e-3, weight_decay=1e-5)
  • Loss: BCEWithLogitsLoss with pos_weight for class imbalance
  • Scheduler: StepLR (step_size=10, gamma=0.5)
  • Epochs: 30, batch size 64

Backtest parameters

Parameter Value
Initial capital $10,000
Transaction cost 5 bps per trade
Signal Long on Buy (1), flat on Hold/Sell (0)
Benchmark Buy-and-hold GC=F

Evaluation metrics

ROC-AUC per model, classification reports, confusion matrices, total return, CAGR, Sharpe ratio, max drawdown, final portfolio value vs buy-and-hold, rolling Sharpe ratio, monthly returns heatmap.

Visualisations

  • Gold price with SMA 50, SMA 200, and Bollinger Bands
  • RSI and annualised volatility panels
  • Forward return distribution with threshold marker
  • Feature correlation heatmap
  • ROC curves and precision-recall curves for all four models
  • Confusion matrices (RF, XGBoost, LSTM, Ensemble)
  • LSTM training and validation loss curves
  • Feature importances for RF and XGBoost
  • Backtest portfolio value vs buy-and-hold
  • Drawdown chart and daily signal chart
  • Rolling 60-day Sharpe ratio
  • Monthly returns heatmap

Dependencies

numpy pandas matplotlib seaborn yfinance scikit-learn xgboost torch

3. Crude Oil Trading (Unsupervised Learning)

File: Crude_Oil_Trading.ipynb
Instrument: Crude Oil WTI Futures (CL=F)
Data range: 2010-01-01 to 2024-12-31 (daily bars)

A market regime detection system that uses three unsupervised algorithms to identify distinct market states in crude oil and derive long/flat trading signals from those regimes without any labelled data.

Pipeline

  1. Download CL=F OHLCV from yfinance
  2. Engineer 25+ technical features
  3. Scale with RobustScaler (preferable to StandardScaler for oil due to fat-tailed returns and price dislocations like the 2020 negative price event)
  4. Reduce dimensionality with PCA (retain 90% of variance)
  5. Detect regimes with K-Means
  6. Identify anomalous sessions with DBSCAN
  7. Model soft regime probabilities with Gaussian Mixture Model (GMM)
  8. Map each regime to a trading signal and backtest against buy-and-hold

Technical features

Category Features
Multi-horizon returns Ret_1d, Ret_3d, Ret_5d, Ret_10d, Ret_20d
Log returns and volatility LogRet, Vol_5, Vol_10, Vol_20, Vol_Ratio
Trend SMA_5, SMA_10, SMA_20, SMA_50, SMA_200, EMA_12, EMA_26
Momentum RSI (14), MACD, MACD_Signal, MACD_Hist, Stoch_K, Stoch_D
Bands and range BB_Upper, BB_Lower, BB_Width, ATR, HL_Ratio, OC_Ratio
Volume OBV, Volume_SMA_20, Volume_Ratio

Algorithm details

K-Means regime detection:

  • Optimal K selected by maximising silhouette score over K=2 to 9
  • Also evaluates inertia, Calinski-Harabasz score, and Davies-Bouldin score
  • Each regime characterised by mean 1-day return, 5-day return, volatility, and RSI
  • Regime labels assigned programmatically (Bullish / Bearish / Neutral / High-Volatility) based on mean return sign and volatility level
  • Signal: Long in regimes with positive mean 1-day return, flat otherwise

DBSCAN anomaly detection:

  • Epsilon selected via k-NN distance plot knee heuristic
  • min_samples=10
  • Points labelled -1 (noise/anomaly) are flagged for signal filtering
  • Signal: Apply K-Means signal but force flat on anomalous sessions (DBSCAN-filtered strategy)

Gaussian Mixture Model (GMM):

  • Optimal number of components selected by minimising BIC over n=2 to 8
  • Full covariance matrices
  • Each day carries a soft probability vector across all components
  • Entropy computed from probabilities as an uncertainty measure
  • Signal: Long when the probability-weighted expected 1-day return is positive

PCA analysis

  • Full PCA fit to determine components needed for 90% and 95% variance retention
  • Final model uses the 90% threshold
  • Biplot of PC1 vs PC2 and PC1 vs PC3 coloured by year
  • Feature loadings visualised for PC1 and PC2

Backtest parameters

Parameter Value
Initial capital $10,000
Transaction cost 5 bps per trade
Return type Log returns (cumulative product)
Benchmark Buy-and-hold CL=F
Strategies compared K-Means, GMM, DBSCAN-Filtered

Evaluation metrics

Total return, CAGR, Sharpe ratio, max drawdown, win rate for each strategy and buy-and-hold. Side-by-side bar chart comparison across all metrics. Monthly returns heatmap for the best-performing strategy by Sharpe ratio.

Visualisations

  • Crude oil price with SMA 50, SMA 200, and Bollinger Bands
  • RSI, volatility, volume panels (EDA)
  • PCA explained variance scree plot and cumulative variance curve
  • PCA biplots (PC1 vs PC2, PC1 vs PC3) coloured by year
  • PCA feature loadings for PC1 and PC2
  • K-Means evaluation curves (inertia, silhouette, CH, DB) over K range
  • Regime timeline: price coloured by K-Means cluster
  • Regime feature distribution boxplots (return, volatility, RSI)
  • DBSCAN k-NN distance plot with epsilon marker
  • Anomaly overlay on price chart
  • Normal vs anomaly return distribution comparison
  • GMM component count selection (BIC/AIC curves)
  • GMM soft probability time series per component
  • GMM entropy over time (uncertainty heatmap)
  • PCA scatter with all three clustering methods side by side
  • Backtest portfolio curves for all strategies vs buy-and-hold
  • Drawdown chart
  • Rolling Sharpe ratio
  • Monthly returns heatmap (best strategy by Sharpe)
  • Performance metrics comparison bar chart

Dependencies

numpy pandas matplotlib seaborn yfinance scikit-learn scipy

Design Decisions

No data leakage. In all three notebooks the scaler is fitted exclusively on the training window and then applied to the test window. The train/test split is strictly time-based throughout.

Realistic transaction costs. Each strategy deducts 5 basis points per trade (round-turn entry or exit). This is included in all backtest return calculations.

Consistent benchmark. Every backtest compares against a passive buy-and-hold of the same instrument over the same test period starting from the same initial capital.

Seed control. Random seeds are fixed (typically 42) across numpy, PyTorch, scikit-learn, and stable-baselines3 for reproducibility.

RobustScaler for oil. The crude oil notebook uses RobustScaler rather than StandardScaler because crude oil returns have pronounced fat tails and include extreme sessions (including negative prices in April 2020). RobustScaler uses the interquartile range and is less distorted by such outliers.


Getting Started

Prerequisites: Python 3.9 or later, Jupyter Notebook or JupyterLab.

Install dependencies for all three notebooks:

pip install numpy pandas matplotlib seaborn yfinance scikit-learn xgboost \
            torch stable-baselines3 gymnasium scipy

Run a notebook:

jupyter notebook Forex_Trading.ipynb

Each notebook is fully self-contained. Run all cells from top to bottom. Data is downloaded automatically on the first run. No additional configuration is required to reproduce the default results.

To change the instrument or date range, edit the TICKER, START_DATE, and END_DATE variables in the config cell near the top of each notebook.


Requirements Summary

Library Used in
numpy All
pandas All
matplotlib All
seaborn Gold, Crude Oil
yfinance All
scikit-learn All
scipy Crude Oil
xgboost Gold
torch Forex, Gold
stable-baselines3 Forex
gymnasium Forex

License

MIT License. Copyright (c) 2026 QuantSingularity. See LICENSE for full terms.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors