A complete PPO-based reinforcement learning training system for the Figgie card game, featuring curriculum learning, self-play, and human-AI battle interfaces.
This project implements a full-stack RL training pipeline:
- Game Simulator: Complete Figgie rules engine with event-driven architecture
- PPO Training: Advanced training with curriculum learning, self-play, bootstrap opponents
- Human-AI Interface: Terminal UI and web-based game interface
- Comprehensive Testing: 148 unit tests, 10,000-game validation
Project Status: β Production Ready Training Validated: 26M+ timesteps across multiple training runs Documentation: Complete guides for training, evaluation, and playing
pip install -r requirements.txt# Quick test
python -m scripts.train_ppo --num-envs 4 --total-timesteps 100000
# Full training
python -m scripts.train_ppo \
--num-envs 16 --num-steps 512 --total-timesteps 20000000 \
--batch-size 512 --hidden-dim 256 --learning-rate 1e-3python -m scripts.evaluate_trained_model checkpoints/final_model.pt# Terminal interface
python play_figgie.py --opponents "Random,Random,Random" --position 0
# Web interface (visit http://localhost:8000)
python run_web.pypytest tests/ -v- 40 cards total, distributed as 12/10/10/8 across four suits
- Goal suit: Worth $10/card, has 8 or 10 cards, same color as 12-card suit
- Hidden info: Distribution not revealed until game ends
- Each player starts with $300, antes $50 β $200 pot
- At settlement: $10 per goal suit card + bonus for most goal suit cards
- Bonus: $100 (10-card goal) or $120 (8-card goal)
- Clear on trade: After each trade, all quotes for all suits are cleared
- Only 1 card trades per transaction
- Players can post quotes (bid/ask), hit bids, or lift asks
figgie_rl/
βββ figgie/ # Game engine
β βββ rules/ # Deck, settlement, constants
β βββ sim/ # State, actions, market, engine
βββ training/ # RL training framework
β βββ envs/ # Gym environment (ego + opponents)
β βββ encoding/ # Obs/action encoding, masking
β βββ opponents/ # Random, MM, self-play policies
β βββ models/ # Actor-Critic (MLP + LSTM)
β βββ ppo/ # Rollout, GAE, trainer
β βββ eval/ # Policy evaluation
βββ scripts/ # Training, evaluation, plotting
βββ web_app/ # FastAPI backend for web UI
βββ static/ # Frontend (HTML/CSS/JS)
βββ ui/ # Terminal UI (rich library)
βββ tests/ # 148 unit tests
βββ play_figgie.py # Terminal human-AI battle
βββ run_web.py # Web human-AI battle
- Curriculum Learning: Adaptive opponent difficulty (3 annealing mechanisms)
- Self-play: Recency-weighted snapshot sampling + performance gating
- Bootstrap Opponents: Load previous training run models as fixed opponents
- LSTM Support: Recurrent networks for temporal patterns
- P12 Auxiliary Loss: Helps model learn deck structure
- Flexible Rewards: Component-wise scaling (trade/goal/bonus)
- 20+ metrics logged to CSV per iteration
- Visualization tools (
scripts/plot_training_metrics.py) - Resume training from checkpoints
- Self-play pool management (LRU cache)
- Terminal: Rich colorful interface, real-time game state
- Web: FastAPI + WebSocket, smart 3-step action builder
- Supports multiple opponent types (Random, MM, trained models)
- Average per game: 100 steps, 7.3 trades
- 10k validation: 100% pass rate, zero errors
- Runtime: ~2 minutes for 10,000 games
Run 1 (20260122_141255_nc):
- 12,000 iterations, 20M+ timesteps, ~8.5 hours
- Non-recurrent, 256 hidden
- Mean Return: ~40-60 vs mixed opponents
Run 2 (20260124_122433):
- 6,500+ iterations (ongoing), LSTM + Bootstrap
- 128 hidden, uses Run 1 models as opponents
- Mean Return: ~40-60 vs bootstrap + self-play
from figgie.sim.engine import FiggieEngine
from figgie.sim.actions import Quote, LiftAsk
engine = FiggieEngine(seed=42)
truth = engine.reset()
# Post quote
action = Quote(player_id=0, suit='S', bid_price=10, ask_price=15)
truth, event, trade, done = engine.step(action)
# Buy
action = LiftAsk(player_id=1, suit='S')
truth, event, trade, done = engine.step(action)from training.envs.ego_figgie_env import EgoFiggieEnv
from training.opponents.random import RandomOpponent
opponents = [RandomOpponent(), RandomOpponent(), RandomOpponent()]
env = EgoFiggieEnv(opponents=opponents, seed=42)
obs, info = env.reset()
obs, reward, terminated, truncated, info = env.step(action_id)import torch
from training.models.actor_critic import ActorCritic
model = ActorCritic(obs_dim=86, act_dim=98, hidden_dim=256)
checkpoint = torch.load('checkpoints/final_model.pt')
model.load_state_dict(checkpoint['model_state_dict'])
model.eval()PROJECT_SUMMARY.md: Complete project overview (architecture, training, frontend)TRAINING_GUIDE.md: Detailed training parameters and configurationsPLAY_GUIDE.md: How to play against AI (terminal + web)WEB_README.md: Web interface usage and technical detailsCLAUDE.md: Game rules and architecture guide for AI assistants
- 148 unit tests: Rules, simulator, environment, encoding, models, PPO
- 10,000-game validation: Zero crashes, zero negative cash, zero card count errors
- 26M+ timesteps: Validated through actual training runs
- Zero-sum property: Economic balance maintained
Negative Cash Bug (Resolved):
- Issue: ~28% games had negative cash
- Root cause:
HitBidvalidation ignored buyer's cash - Solution: Added buyer cash check in validation
- Result: 0/10,000 games with negative cash β
- Python 3.10+
- PyTorch (CPU or CUDA)
- NumPy, Gymnasium
- FastAPI, WebSockets (for web UI)
- Rich (for terminal UI)
- Pytest (for testing)
See requirements.txt for complete list.
MIT License
Production-ready RL project featuring:
- Complete Figgie game simulator (event-driven, zero-sum)
- State-of-the-art PPO training (curriculum learning, self-play, bootstrap)
- Multiple network architectures (MLP, LSTM, auxiliary heads)
- Human-AI battle interfaces (terminal + web)
- Comprehensive testing and validation
- Publication-quality codebase
Code Scale:
- 8,000 lines core code
- 3,000 lines tests
- 1,500 lines frontend
- 6 detailed documentation files
Completion Date: 2025-01-25 Status: β Production Ready