Schnarps

Python-first implementation of the Schnarps project plan:

Rules-complete game engine (new_game, legal_actions, step)
CLI play modes (play-human-vs-bot, bot-vs-bot, replay)
Baseline bot policies (random + heuristic)
RL environment wrapper + masked PPO trainer scaffold
JSONL logging + Parquet export pipeline
Advisor API scaffold for UI integration

Quickstart

python -m venv .venv
source .venv/bin/activate
pip install -e '.[test,data,rl]'
pytest

Conda alternative:

conda activate schnarps
pip install -e '.[test,data,rl]'
pytest

Run CLI:

schnarps play-human-vs-bot --seed 1 --human-player 0
schnarps bot-vs-bot --games 10 --seed 10 --log data/games.jsonl
schnarps replay data/games.jsonl
schnarps-export data/games.jsonl data/games.parquet

Run simulator script:

python scripts/simulate.py --games 1000 --output data/games.jsonl

Phase-2 pilot generation from trained checkpoint:

python scripts/generate_phase2_pilot.py \
  --checkpoint experiments/ppo_sweep_2026-02-26/cfg3_best.pt \
  --games 5000 \
  --output data/phase2/pilot_games.jsonl

Build canonical Phase-2 events parquet:

python scripts/build_phase2_canonical.py \
  --inputs data/phase2/pilot_games_concurrent.parquet data/phase2/pilot_games_batch2.parquet \
  --output data/phase2/canonical_events.parquet

Extract first-pass supervised bid/move datasets:

python scripts/extract_phase2_supervised.py \
  --inputs data/phase2/pilot_games_concurrent.jsonl data/phase2/pilot_games_batch2.jsonl \
  --output-dir data/phase2/supervised \
  --max-games 500

Full-scale sharded extraction:

python scripts/extract_phase2_supervised.py \
  --inputs data/phase2/pilot_games_concurrent.jsonl data/phase2/pilot_games_batch2.jsonl \
  --output-dir data/phase2/supervised_full \
  --max-games 0 \
  --shard-rows 250000 \
  --progress-every 500

Train PPO baseline:

python scripts/train_ppo.py --updates 20 --steps-per-update 2048 --checkpoint checkpoints/ppo_latest.pt

Phase-scoped PPO training for separate checkpoints:

python scripts/train_ppo.py --policy-scope in_out --checkpoint checkpoints/ppo_in_out_latest.pt
python scripts/train_ppo.py --policy-scope play --checkpoint checkpoints/ppo_play_latest.pt

By default, training also tracks and saves the best eval checkpoint at checkpoints/ppo_best.pt. Disable with --no-track-best.

If you change evaluation regime substantially (for example switching from full-policy eval semantics to scoped phase-only eval), write to a fresh --best-checkpoint path to avoid stale best-threshold carryover.

Seat randomization (recommended):

python scripts/train_ppo.py --seat-mode random --learner-seats 0,1,2,3,4

Deterministic seat balancing (useful for stability diagnostics):

python scripts/train_ppo.py --seat-mode cycle --learner-seats 0,1,2,3,4

Phase-head PPO training (new track):

python scripts/train_ppo.py \
  --model-variant phase_heads \
  --seat-mode random \
  --learner-seats 0,1,2,3,4 \
  --opponent-mix heuristic:0.6,random:0.2,self:0.2 \
  --entropy-coef 0.01 \
  --entropy-coef-bid 0.02 \
  --eval-stochastic-episodes 20 \
  --entropy-coef-play 0.005 \
  --hand-strength-signal-coef 0.05

Heuristic-majority league training (recommended robust default):

python scripts/train_ppo.py \
  --policy-scope play \
  --opponent-mix-preset heuristic_majority \
  --opponent-checkpoint-dir experiments/opponent_league \
  --opponent-checkpoint-limit 12 \
  --snapshot-dir experiments/opponent_league \
  --snapshot-prefix play_league \
  --snapshot-every-updates 5 \
  --snapshot-max-keep 30

Conservative phase-head evaluation gates:

python scripts/evaluate_ppo_phase_heads.py \
  --candidate-checkpoint checkpoints/ppo_best.pt \
  --baseline-checkpoint experiments/ppo_sweep_2026-02-26/cfg3_best.pt \
  --games-per-matchup 2000 \
  --opening-bid-games 500 \
  --min-heuristic-win-rate 0.20 \
  --output experiments/ppo_phase_heads/eval_report.json

Dedicated bidding pipeline (recommended next for bidding collapse):

# 1) Train bid-only supervised model (with optional temperature calibration)
python scripts/train_bid_supervised.py \
  --bid-data data/phase2/supervised_full/bid_decisions/*.parquet data/phase2/supervised_v2_delta/bid_decisions/*.parquet \
  --output experiments/bid_supervised_v1/bid_model.pt \
  --summary-out experiments/bid_supervised_v1/summary.json \
  --class-balance inverse_sqrt \
  --calibrate-temperature

# 2) Warm-start dedicated bid-only PPO from supervised bid model
python scripts/train_bid_policy.py \
  --model-variant phase_heads \
  --stable-policy ppo \
  --stable-ppo-checkpoint experiments/ppo_sweep_2026-02-26/cfg3_best.pt \
  --warm-start-bid-supervised experiments/bid_supervised_v1/bid_model.pt \
  --bid-anchor-supervised experiments/bid_supervised_v1/bid_model.pt \
  --bid-anchor-coef 0.03 \
  --bid-anchor-temperature 0.8 \
  --hand-strength-signal-coef 0.05 \
  --checkpoint experiments/bid_ppo_v1/latest.pt \
  --best-checkpoint experiments/bid_ppo_v1/best.pt \
  --metrics-out experiments/bid_ppo_v1/metrics.jsonl

# 3) Evaluate bid-tuned PPO checkpoint with conservative gate script
python scripts/evaluate_ppo_phase_heads.py \
  --candidate-checkpoint experiments/bid_ppo_v1/best.pt \
  --candidate-fallback-checkpoint experiments/ppo_sweep_2026-02-26/cfg3_best.pt \
  --baseline-checkpoint experiments/ppo_sweep_2026-02-26/cfg3_best.pt \
  --games-per-matchup 2000 \
  --opening-bid-games 500 \
  --output experiments/bid_ppo_v1/eval.json

# 4) Evaluate bid-policy quality/calibration on held-out bid states
python scripts/evaluate_bid_policy_quality.py \
  --candidate-checkpoint experiments/bid_ppo_v1/best.pt \
  --sample-rows 300000 \
  --output experiments/bid_ppo_v1/bid_quality.json

Stability-first bid pipeline (fixed baseline + fixed eval seeds + burst training + deterministic promotion):

python scripts/run_bid_stability_pipeline.py \
  --base-checkpoint experiments/bid_ppo_v6_handstrength_2026-02-27/hs_cfg1_best.pt \
  --baseline-checkpoint experiments/ppo_sweep_2026-02-26/cfg3_best.pt \
  --stable-ppo-checkpoint experiments/ppo_sweep_2026-02-26/cfg3_best.pt \
  --bursts 8 \
  --burst-updates 8 \
  --early-stop-patience-updates 4 \
  --early-stop-warmup-updates 4 \
  --early-stop-min-delta 0.002

The default promotion gate set now requires:

non_regression_vs_baseline
seat_stability
heuristic_strength

The pipeline writes:

summary.json with burst-by-burst metrics + promotion decisions.
REPORT.md with a compact comparison table.
promoted/promoted_latest.pt as the current promoted checkpoint.

Bundle-first scoped league pipeline:

python scripts/run_phase_league_pipeline.py \
  --python-exec /opt/anaconda3/envs/schnarps/bin/python \
  --experiment-dir experiments/phase_league_2026-03-10_r1 \
  --fallback-bundle experiments/long_explore_v1_2026-03-04/policy_bundle_long_v1.json \
  --baseline-bundle experiments/long_explore_v1_2026-03-04/policy_bundle_long_v1.json \
  --extra-opponent-bundles experiments/parallel_triple_v1_2026-03-05/policy_bundle_selected.json \
  --run-bid-rollout-refresh

The pipeline:

trains in_out and play against routed bundle opponents via strong_bundle_league
evaluates the resulting candidate bundle against heuristic plus strong bundle opponents
refreshes bid rollout labels only if the candidate bundle clears the required strength gates

Transfer-focused in_out curriculum (current recommended RL training path):

python scripts/run_in_out_transfer_curriculum.py \
  --python-exec /opt/anaconda3/envs/schnarps/bin/python \
  --experiment-dir experiments/in_out_transfer_curriculum_2026-03-11_r1

The pipeline:

runs heuristic-anchor bursts first to recover isolated in_out signal
switches to learned-partner bursts using routed bundle fallbacks and strong-bundle opponents
validates every burst with routed fixed-partner bundle comparison
promotes only when the candidate clears the transfer gates from docs/EVAL_PROTOCOL_V3.md

Current strategy reset rules:

bid: prefer rollout-supervised training, and select checkpoints from routed external eval
in_out: active RL scope via run_in_out_transfer_curriculum.py
play: current PPO loop is paused until its reward/curriculum is redesigned

Heuristic-backed scoped bootstrap with routed comparison reports:

python scripts/run_scoped_heuristic_bootstrap.py \
  --python-exec /opt/anaconda3/envs/schnarps/bin/python \
  --experiment-dir experiments/scoped_heuristic_bootstrap_2026-03-10_r1

The pipeline:

trains bid, in_out, and play with heuristic partners to reduce cross-module poisoning
builds routed incumbent and candidate bundles for each scope
writes scope-specific routed comparison reports against the routed incumbent plus strong extra bundles

When evaluating a scoped checkpoint directly (for example a bid-only PPO model), pass a full fallback via --candidate-fallback-checkpoint so only the owned phase comes from the candidate.

Early-stop controls are also available directly in:

scripts/train_bid_policy.py
scripts/train_ppo.py

Train supervised advisor models (bid/move):

python scripts/train_supervised_v1.py \
  --bid-data data/phase2/supervised_full/bid_decisions/*.parquet data/phase2/supervised_v2_delta/bid_decisions/*.parquet \
  --move-data data/phase2/supervised_full/move_decisions/*.parquet data/phase2/supervised_v2_delta/move_decisions/*.parquet \
  --output-dir experiments/supervised_v3

Seat-balanced + seed-paired advisor evaluation:

python scripts/evaluate_supervised_seat_balanced.py \
  --bid-model experiments/supervised_v3/bid_model.pt \
  --move-model experiments/supervised_v3/move_model.pt \
  --games-per-matchup 5000 \
  --output experiments/supervised_v3/eval_seat_balanced_5k.json

Build Phase-4 vertical-slice payload:

python scripts/build_phase4_vertical_slice.py \
  --advisor-policy ppo \
  --policy-bundle docs/examples/policy_bundle_v1.json \
  --ppo-value-temperature 2.0 \
  --risk-profile balanced \
  --output apps/tauri-ui/mock/vertical_slice.json

Run local Phase-4 bridge + interactive UI:

python scripts/run_phase4_bridge.py \
  --advisor-policy ppo \
  --policy-bundle docs/examples/policy_bundle_v1.json \
  --opponent-policy heuristic \
  --risk-profile balanced \
  --static-dir apps/tauri-ui/mock \
  --port 8765

Latency/stability benchmark for advisor inference:

python scripts/benchmark_advisor_latency.py \
  --advisor-policy ppo \
  --policy-bundle docs/examples/policy_bundle_v1.json \
  --output experiments/phase4_alpha/latency_report_ppo.json

PPO-teacher distillation + calibration (Phase-3 refinement):

python scripts/train_distilled_from_ppo.py \
  --teacher-checkpoint experiments/bid_ppo_v6_handstrength_2026-02-27/hs_cfg1_best.pt \
  --output-dir experiments/supervised_distill_v1

python scripts/calibrate_supervised_temperature.py \
  --bid-checkpoint experiments/supervised_distill_v1/bid_model.pt \
  --move-checkpoint experiments/supervised_distill_v1/move_model.pt \
  --output-dir experiments/supervised_distill_v1/calibrated

Compute formal Phase-4 go/no-go decision:

python scripts/phase4_go_no_go.py \
  --canonical-summary data/phase2/supervised_v2_merged_summary.json \
  --train-summary experiments/supervised_v3/summary.json \
  --eval-summary experiments/supervised_v3/eval_seat_balanced_5k.json \
  --calibration-summary experiments/supervised_distill_v1/calibrated/summary.json \
  --output experiments/supervised_v3/phase4_go_no_go.json

Project Layout

schnarps/engine: deterministic rules engine
schnarps/bots: baseline policies
schnarps/rl: self-play / evaluation scaffolding
schnarps/data: logging, replay, export
schnarps/advisor: recommendation API contract
apps/tauri-ui: UI phase placeholder
tests: rule and determinism tests

New architecture/reference docs:

docs/ARCHITECTURE_RL.md
docs/POLICY_BUNDLE_SPEC.md
docs/TRAINING_PHASED_3POLICY.md
docs/PROBABILITY_MODEL.md
docs/EVAL_PROTOCOL_V3.md
docs/EVAL_PROTOCOL_V2.md

Current Assumptions

The engine follows Rules.md plus the explicit planning decisions finalized in chat:

52-card deck, 5 cards per active player
Bids are pass or integer 1..5
All-pass forces dealer to bid 1
High bidder always IN
Forced-IN when trump is spades or high bid is 1
Punt overrides trick-based reduction
Sitting at score <=5 adds +1
Immediate win at score <=0; elimination at >=32
Dealer rotates +1 seat, skipping eliminated players

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
apps/tauri-ui		apps/tauri-ui
docs		docs
schnarps		schnarps
scripts		scripts
tests		tests
.gitignore		.gitignore
AGENTS.md		AGENTS.md
README.md		README.md
Rules.md		Rules.md
UI_PLAN.md		UI_PLAN.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Schnarps

Quickstart

Project Layout

Current Assumptions

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Schnarps

Quickstart

Project Layout

Current Assumptions

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages