Trading Bots

A collection of three self-contained Jupyter notebooks, each demonstrating a different machine learning paradigm applied to algorithmic trading. The notebooks cover reinforcement learning, supervised learning, and unsupervised learning across three distinct asset classes: forex, gold futures, and crude oil futures.

All market data is fetched directly from Yahoo Finance via yfinance, so no external data files are required. Each notebook runs independently from installation to backtest.

Repository Structure

Trading-Bots/
├── Forex_Trading.ipynb       # RL bot: DQN agent on EUR/USD
├── Gold_Trading.ipynb        # Supervised bot: RF + XGBoost + LSTM ensemble on GC=F
├── Crude_Oil_Trading.ipynb   # Unsupervised bot: regime detection on CL=F
└── LICENSE

Notebooks

1. Forex Trading (Reinforcement Learning)

File: Forex_Trading.ipynb
Instrument: EUR/USD (EURUSD=X)
Data range: 2021-01-01 to 2024-12-31 (daily bars)

A Deep Q-Network (DQN) agent trained to trade EUR/USD spot forex. The agent interacts with a custom Gymnasium environment that simulates realistic trading conditions including a 2-pip spread and ruin termination if the account loses more than 50% of its initial balance.

Pipeline

Download EUR/USD OHLCV from yfinance
Engineer technical indicators as state features
Fit a MinMaxScaler on training data only (no leakage into test)
Build a custom ForexTradingEnv Gymnasium environment
Train a DQN agent with a 3-layer MLP policy
Evaluate on the held-out test set and compare against buy-and-hold

Action space

Action	Meaning
0	Hold (maintain current position)
1	Buy (open long / close short)
2	Sell (open short / close long)

State space

A window of the last 20 scaled feature vectors concatenated with the current normalised position and cumulative PnL.

Technical features

Feature	Description
Returns	1-period price return
HL_Ratio	(High - Low) / Close
OC_Ratio	(Close - Open) / Open
SMA_5, SMA_20, SMA_50	Simple moving averages
BB_Upper, BB_Lower, BB_Width	Bollinger Bands (20-period)
RSI	RSI (14-period)
MACD, MACD_Signal	MACD and signal line
ATR	Average True Range
OBV	On-Balance Volume

DQN configuration

Parameter	Value
Policy	MlpPolicy [256, 256, 128]
Learning rate	1e-4
Replay buffer	50,000
Batch size	64
Gamma	0.99
Tau (soft update)	0.005
Train frequency	every 4 steps
Exploration fraction	0.30
Epsilon final	0.05
Total timesteps	100,000
Train/test split	80% / 20%

Evaluation metrics

Total return, annualised return, Sharpe ratio, Sortino ratio, Calmar ratio, max drawdown, total trade count, win rate, and action distribution analysis.

Visualisations

EUR/USD price with indicators and train/test boundary
DQN episode reward curve (raw and smoothed)
Portfolio value vs buy-and-hold on test set
Drawdown and trade markers over time
Action distribution bar chart and pie chart

Dependencies

numpy pandas matplotlib yfinance stable-baselines3 gymnasium scikit-learn torch

2. Gold Trading (Supervised Learning)

File: Gold_Trading.ipynb
Instrument: Gold Futures (GC=F, COMEX)
Data range: 2015-01-01 to 2024-12-31 (daily bars)

A binary classification pipeline that predicts whether gold will rise more than 0.5% over the next five trading days. Three classifiers are trained independently and combined into a soft-voting ensemble.

Pipeline

Download GC=F OHLCV from yfinance
Engineer 20+ technical features
Label each day as Buy (1) if the 5-day forward return exceeds 0.5%, else Hold/Sell (0)
Time-based 80/20 train/test split with StandardScaler fitted on train only
Train Random Forest, XGBoost, and LSTM classifiers
Combine into a weighted soft-voting ensemble (RF 30%, XGBoost 40%, LSTM 30%)
Backtest the ensemble signal against buy-and-hold

Label construction

Forward_Return = pct_change(5).shift(-5)
Label = 1 if Forward_Return > 0.005 else 0

Technical features

Category	Features
Multi-horizon returns	Return_1d, Return_2d, Return_3d, Return_5d, Return_10d
Volatility	Log_Return, Volatility_5, Volatility_20
Moving averages	SMA_5, SMA_10, SMA_20, SMA_50, SMA_200, EMA_12, EMA_26
Momentum	RSI (14), MACD, MACD_Signal, MACD_Hist
Bands	BB_Upper, BB_Lower, BB_Width
Range	ATR (14), HL_Ratio, OC_Ratio
Volume	OBV, Volume_SMA_20, Volume_Ratio, MFI, Stoch_K, Stoch_D

Model configurations

Random Forest:

500 estimators, max depth 8, min samples leaf 20
max_features='sqrt', class_weight='balanced'

XGBoost:

600 estimators, max depth 5, learning rate 0.03
subsample=0.8, colsample_bytree=0.8
scale_pos_weight set from class ratio to handle imbalance

LSTM:

2 LSTM layers, hidden size 128, dropout 0.3
Sequence length: 20 timesteps
Head: Linear(128, 64) + ReLU + Dropout(0.2) + Linear(64, 1) + Sigmoid
Optimiser: Adam (lr=1e-3, weight_decay=1e-5)
Loss: BCEWithLogitsLoss with pos_weight for class imbalance
Scheduler: StepLR (step_size=10, gamma=0.5)
Epochs: 30, batch size 64

Backtest parameters

Parameter	Value
Initial capital	$10,000
Transaction cost	5 bps per trade
Signal	Long on Buy (1), flat on Hold/Sell (0)
Benchmark	Buy-and-hold GC=F

Evaluation metrics

ROC-AUC per model, classification reports, confusion matrices, total return, CAGR, Sharpe ratio, max drawdown, final portfolio value vs buy-and-hold, rolling Sharpe ratio, monthly returns heatmap.

Visualisations

Gold price with SMA 50, SMA 200, and Bollinger Bands
RSI and annualised volatility panels
Forward return distribution with threshold marker
Feature correlation heatmap
ROC curves and precision-recall curves for all four models
Confusion matrices (RF, XGBoost, LSTM, Ensemble)
LSTM training and validation loss curves
Feature importances for RF and XGBoost
Backtest portfolio value vs buy-and-hold
Drawdown chart and daily signal chart
Rolling 60-day Sharpe ratio
Monthly returns heatmap

Dependencies

numpy pandas matplotlib seaborn yfinance scikit-learn xgboost torch

3. Crude Oil Trading (Unsupervised Learning)

File: Crude_Oil_Trading.ipynb
Instrument: Crude Oil WTI Futures (CL=F)
Data range: 2010-01-01 to 2024-12-31 (daily bars)

A market regime detection system that uses three unsupervised algorithms to identify distinct market states in crude oil and derive long/flat trading signals from those regimes without any labelled data.

Pipeline

Download CL=F OHLCV from yfinance
Engineer 25+ technical features
Scale with RobustScaler (preferable to StandardScaler for oil due to fat-tailed returns and price dislocations like the 2020 negative price event)
Reduce dimensionality with PCA (retain 90% of variance)
Detect regimes with K-Means
Identify anomalous sessions with DBSCAN
Model soft regime probabilities with Gaussian Mixture Model (GMM)
Map each regime to a trading signal and backtest against buy-and-hold

Technical features

Category	Features
Multi-horizon returns	Ret_1d, Ret_3d, Ret_5d, Ret_10d, Ret_20d
Log returns and volatility	LogRet, Vol_5, Vol_10, Vol_20, Vol_Ratio
Trend	SMA_5, SMA_10, SMA_20, SMA_50, SMA_200, EMA_12, EMA_26
Momentum	RSI (14), MACD, MACD_Signal, MACD_Hist, Stoch_K, Stoch_D
Bands and range	BB_Upper, BB_Lower, BB_Width, ATR, HL_Ratio, OC_Ratio
Volume	OBV, Volume_SMA_20, Volume_Ratio

Algorithm details

K-Means regime detection:

Optimal K selected by maximising silhouette score over K=2 to 9
Also evaluates inertia, Calinski-Harabasz score, and Davies-Bouldin score
Each regime characterised by mean 1-day return, 5-day return, volatility, and RSI
Regime labels assigned programmatically (Bullish / Bearish / Neutral / High-Volatility) based on mean return sign and volatility level
Signal: Long in regimes with positive mean 1-day return, flat otherwise

DBSCAN anomaly detection:

Epsilon selected via k-NN distance plot knee heuristic
min_samples=10
Points labelled -1 (noise/anomaly) are flagged for signal filtering
Signal: Apply K-Means signal but force flat on anomalous sessions (DBSCAN-filtered strategy)

Gaussian Mixture Model (GMM):

Optimal number of components selected by minimising BIC over n=2 to 8
Full covariance matrices
Each day carries a soft probability vector across all components
Entropy computed from probabilities as an uncertainty measure
Signal: Long when the probability-weighted expected 1-day return is positive

PCA analysis

Full PCA fit to determine components needed for 90% and 95% variance retention
Final model uses the 90% threshold
Biplot of PC1 vs PC2 and PC1 vs PC3 coloured by year
Feature loadings visualised for PC1 and PC2

Backtest parameters

Parameter	Value
Initial capital	$10,000
Transaction cost	5 bps per trade
Return type	Log returns (cumulative product)
Benchmark	Buy-and-hold CL=F
Strategies compared	K-Means, GMM, DBSCAN-Filtered

Evaluation metrics

Total return, CAGR, Sharpe ratio, max drawdown, win rate for each strategy and buy-and-hold. Side-by-side bar chart comparison across all metrics. Monthly returns heatmap for the best-performing strategy by Sharpe ratio.

Visualisations

Crude oil price with SMA 50, SMA 200, and Bollinger Bands
RSI, volatility, volume panels (EDA)
PCA explained variance scree plot and cumulative variance curve
PCA biplots (PC1 vs PC2, PC1 vs PC3) coloured by year
PCA feature loadings for PC1 and PC2
K-Means evaluation curves (inertia, silhouette, CH, DB) over K range
Regime timeline: price coloured by K-Means cluster
Regime feature distribution boxplots (return, volatility, RSI)
DBSCAN k-NN distance plot with epsilon marker
Anomaly overlay on price chart
Normal vs anomaly return distribution comparison
GMM component count selection (BIC/AIC curves)
GMM soft probability time series per component
GMM entropy over time (uncertainty heatmap)
PCA scatter with all three clustering methods side by side
Backtest portfolio curves for all strategies vs buy-and-hold
Drawdown chart
Rolling Sharpe ratio
Monthly returns heatmap (best strategy by Sharpe)
Performance metrics comparison bar chart

Dependencies

numpy pandas matplotlib seaborn yfinance scikit-learn scipy

Design Decisions

No data leakage. In all three notebooks the scaler is fitted exclusively on the training window and then applied to the test window. The train/test split is strictly time-based throughout.

Realistic transaction costs. Each strategy deducts 5 basis points per trade (round-turn entry or exit). This is included in all backtest return calculations.

Consistent benchmark. Every backtest compares against a passive buy-and-hold of the same instrument over the same test period starting from the same initial capital.

Seed control. Random seeds are fixed (typically 42) across numpy, PyTorch, scikit-learn, and stable-baselines3 for reproducibility.

RobustScaler for oil. The crude oil notebook uses RobustScaler rather than StandardScaler because crude oil returns have pronounced fat tails and include extreme sessions (including negative prices in April 2020). RobustScaler uses the interquartile range and is less distorted by such outliers.

Getting Started

Prerequisites: Python 3.9 or later, Jupyter Notebook or JupyterLab.

Install dependencies for all three notebooks:

pip install numpy pandas matplotlib seaborn yfinance scikit-learn xgboost \
            torch stable-baselines3 gymnasium scipy

Run a notebook:

jupyter notebook Forex_Trading.ipynb

Each notebook is fully self-contained. Run all cells from top to bottom. Data is downloaded automatically on the first run. No additional configuration is required to reproduce the default results.

To change the instrument or date range, edit the TICKER, START_DATE, and END_DATE variables in the config cell near the top of each notebook.

Requirements Summary

Library	Used in
numpy	All
pandas	All
matplotlib	All
seaborn	Gold, Crude Oil
yfinance	All
scikit-learn	All
scipy	Crude Oil
xgboost	Gold
torch	Forex, Gold
stable-baselines3	Forex
gymnasium	Forex

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Trading Bots

Repository Structure

Notebooks

1. Forex Trading (Reinforcement Learning)

2. Gold Trading (Supervised Learning)

3. Crude Oil Trading (Unsupervised Learning)

Design Decisions

Getting Started

Requirements Summary

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Crude_Oil_Trading.ipynb		Crude_Oil_Trading.ipynb
Forex_Trading.ipynb		Forex_Trading.ipynb
Gold_Trading.ipynb		Gold_Trading.ipynb
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Trading Bots

Repository Structure

Notebooks

1. Forex Trading (Reinforcement Learning)

2. Gold Trading (Supervised Learning)

3. Crude Oil Trading (Unsupervised Learning)

Design Decisions

Getting Started

Requirements Summary

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages