Skip to content

xiaopeiii/figgie-rl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Figgie RL - Reinforcement Learning for Figgie Card Game

A complete PPO-based reinforcement learning training system for the Figgie card game, featuring curriculum learning, self-play, and human-AI battle interfaces.

🎯 Project Overview

This project implements a full-stack RL training pipeline:

  • Game Simulator: Complete Figgie rules engine with event-driven architecture
  • PPO Training: Advanced training with curriculum learning, self-play, bootstrap opponents
  • Human-AI Interface: Terminal UI and web-based game interface
  • Comprehensive Testing: 148 unit tests, 10,000-game validation

Project Status: βœ… Production Ready Training Validated: 26M+ timesteps across multiple training runs Documentation: Complete guides for training, evaluation, and playing


πŸš€ Quick Start

Installation

pip install -r requirements.txt

Train a Model

# Quick test 
python -m scripts.train_ppo --num-envs 4 --total-timesteps 100000

# Full training 
python -m scripts.train_ppo \
  --num-envs 16 --num-steps 512 --total-timesteps 20000000 \
  --batch-size 512 --hidden-dim 256 --learning-rate 1e-3

Evaluate Model

python -m scripts.evaluate_trained_model checkpoints/final_model.pt

Play Against AI

# Terminal interface
python play_figgie.py --opponents "Random,Random,Random" --position 0

# Web interface (visit http://localhost:8000)
python run_web.py

Run Tests

pytest tests/ -v

🎲 Figgie Game Rules

Deck & Goal Suit

  • 40 cards total, distributed as 12/10/10/8 across four suits
  • Goal suit: Worth $10/card, has 8 or 10 cards, same color as 12-card suit
  • Hidden info: Distribution not revealed until game ends

Economics

  • Each player starts with $300, antes $50 β†’ $200 pot
  • At settlement: $10 per goal suit card + bonus for most goal suit cards
  • Bonus: $100 (10-card goal) or $120 (8-card goal)

Trading (Key Feature)

  • Clear on trade: After each trade, all quotes for all suits are cleared
  • Only 1 card trades per transaction
  • Players can post quotes (bid/ask), hit bids, or lift asks

πŸ“ Project Structure

figgie_rl/
β”œβ”€β”€ figgie/                  # Game engine
β”‚   β”œβ”€β”€ rules/               # Deck, settlement, constants
β”‚   └── sim/                 # State, actions, market, engine
β”œβ”€β”€ training/                # RL training framework
β”‚   β”œβ”€β”€ envs/                # Gym environment (ego + opponents)
β”‚   β”œβ”€β”€ encoding/            # Obs/action encoding, masking
β”‚   β”œβ”€β”€ opponents/           # Random, MM, self-play policies
β”‚   β”œβ”€β”€ models/              # Actor-Critic (MLP + LSTM)
β”‚   β”œβ”€β”€ ppo/                 # Rollout, GAE, trainer
β”‚   └── eval/                # Policy evaluation
β”œβ”€β”€ scripts/                 # Training, evaluation, plotting
β”œβ”€β”€ web_app/                 # FastAPI backend for web UI
β”œβ”€β”€ static/                  # Frontend (HTML/CSS/JS)
β”œβ”€β”€ ui/                      # Terminal UI (rich library)
β”œβ”€β”€ tests/                   # 148 unit tests
β”œβ”€β”€ play_figgie.py           # Terminal human-AI battle
└── run_web.py               # Web human-AI battle

πŸŽ“ Core Features

Advanced Training System

  • Curriculum Learning: Adaptive opponent difficulty (3 annealing mechanisms)
  • Self-play: Recency-weighted snapshot sampling + performance gating
  • Bootstrap Opponents: Load previous training run models as fixed opponents
  • LSTM Support: Recurrent networks for temporal patterns
  • P12 Auxiliary Loss: Helps model learn deck structure
  • Flexible Rewards: Component-wise scaling (trade/goal/bonus)

Training Monitoring

  • 20+ metrics logged to CSV per iteration
  • Visualization tools (scripts/plot_training_metrics.py)
  • Resume training from checkpoints
  • Self-play pool management (LRU cache)

Human-AI Battle

  • Terminal: Rich colorful interface, real-time game state
  • Web: FastAPI + WebSocket, smart 3-step action builder
  • Supports multiple opponent types (Random, MM, trained models)

πŸ“Š Performance Metrics

Game Simulator

  • Average per game: 100 steps, 7.3 trades
  • 10k validation: 100% pass rate, zero errors
  • Runtime: ~2 minutes for 10,000 games

Training Results (Actual Runs)

Run 1 (20260122_141255_nc):

  • 12,000 iterations, 20M+ timesteps, ~8.5 hours
  • Non-recurrent, 256 hidden
  • Mean Return: ~40-60 vs mixed opponents

Run 2 (20260124_122433):

  • 6,500+ iterations (ongoing), LSTM + Bootstrap
  • 128 hidden, uses Run 1 models as opponents
  • Mean Return: ~40-60 vs bootstrap + self-play

πŸ› οΈ Core API

Game Engine

from figgie.sim.engine import FiggieEngine
from figgie.sim.actions import Quote, LiftAsk

engine = FiggieEngine(seed=42)
truth = engine.reset()

# Post quote
action = Quote(player_id=0, suit='S', bid_price=10, ask_price=15)
truth, event, trade, done = engine.step(action)

# Buy
action = LiftAsk(player_id=1, suit='S')
truth, event, trade, done = engine.step(action)

Training Environment

from training.envs.ego_figgie_env import EgoFiggieEnv
from training.opponents.random import RandomOpponent

opponents = [RandomOpponent(), RandomOpponent(), RandomOpponent()]
env = EgoFiggieEnv(opponents=opponents, seed=42)

obs, info = env.reset()
obs, reward, terminated, truncated, info = env.step(action_id)

Load Trained Model

import torch
from training.models.actor_critic import ActorCritic

model = ActorCritic(obs_dim=86, act_dim=98, hidden_dim=256)
checkpoint = torch.load('checkpoints/final_model.pt')
model.load_state_dict(checkpoint['model_state_dict'])
model.eval()

πŸ“š Documentation

  • PROJECT_SUMMARY.md: Complete project overview (architecture, training, frontend)
  • TRAINING_GUIDE.md: Detailed training parameters and configurations
  • PLAY_GUIDE.md: How to play against AI (terminal + web)
  • WEB_README.md: Web interface usage and technical details
  • CLAUDE.md: Game rules and architecture guide for AI assistants

βœ… Testing & Quality

  • 148 unit tests: Rules, simulator, environment, encoding, models, PPO
  • 10,000-game validation: Zero crashes, zero negative cash, zero card count errors
  • 26M+ timesteps: Validated through actual training runs
  • Zero-sum property: Economic balance maintained

πŸ› Major Bug Fix

Negative Cash Bug (Resolved):

  • Issue: ~28% games had negative cash
  • Root cause: HitBid validation ignored buyer's cash
  • Solution: Added buyer cash check in validation
  • Result: 0/10,000 games with negative cash βœ…

πŸ“¦ Requirements

  • Python 3.10+
  • PyTorch (CPU or CUDA)
  • NumPy, Gymnasium
  • FastAPI, WebSockets (for web UI)
  • Rich (for terminal UI)
  • Pytest (for testing)

See requirements.txt for complete list.


πŸ“ License

MIT License


πŸŽ‰ Summary

Production-ready RL project featuring:

  • Complete Figgie game simulator (event-driven, zero-sum)
  • State-of-the-art PPO training (curriculum learning, self-play, bootstrap)
  • Multiple network architectures (MLP, LSTM, auxiliary heads)
  • Human-AI battle interfaces (terminal + web)
  • Comprehensive testing and validation
  • Publication-quality codebase

Code Scale:

  • 8,000 lines core code
  • 3,000 lines tests
  • 1,500 lines frontend
  • 6 detailed documentation files

Completion Date: 2025-01-25 Status: βœ… Production Ready

Releases

No releases published

Packages

 
 
 

Contributors