Automation for grid-based puzzles where rectangles of digits must sum to 10 (e.g. FruitBox). Pipeline: screen capture, digit recognition (OCR), solving (heuristic or learned), and automated clicking. Grid is 10x17; digits 1-9; goal is to clear as many cells as possible by repeatedly selecting rectangles whose non-zero cells sum to 10.
- main.py – Entry point: capture, OCR, solve, click; performance logging per model.
- ocr_grid.py – Grid detection and digit recognition (template, OCR, or pixel methods).
- solver.py – Heuristic and PPO-based solvers; interface used by main.
- clicker.py – Screen coordinate mapping and mouse automation.
- training.py – Gymnasium environment and PPO training on saved grids.
- brute_force_solver.py – Exhaustive and optimized brute-force search.
- gnn_rl_solver.py – CNN-based RL agent with continuous box output and experience replay.
- cnn_solver.py – CNN autoencoder + strategy/value heads; optional unsupervised/strategy training.
- strategy_cnn.py – Strategy CNN trained on move sequences.
- evaluate_model.py – Evaluation on test grids.
- organize_data.py – Train/validation/test split.
- list_performance.py – Performance aggregation and plots.
- utils.py, eval_comparison.py, training_data.py – Shared utilities and data handling.
Digit recognition runs after grid detection and cell extraction. Three methods:
- template – Pre-loaded digit templates (digits/); each cell matched via
cv2.matchTemplatewithTM_SQDIFF_NORMED; best match chosen. Fast, assumes consistent font. - ocr – Tesseract with
--psm 10, digit whitelist. No templates; useful when font varies. - pixel – No raw template matching; normalized pixel difference plus SSIM vs templates; combined score picks digit. More robust to lighting/small deformations.
Grid and cell geometry are fixed by the game layout; coordinates and cell bounds are configured in main.py (region, offsets, button locations).
Area-priority greedy (find_rectangles_zero_inclusive_greedy): Each pass, enumerate all valid rectangles (non-zero sum = 10, at least one non-zero). Sort by area descending; take the largest valid one, zero it out, repeat. No backtracking; linear number of passes; subsecond. Quality good but not optimal.
Digit-priority greedy (find_rectangles_digit_priority_greedy): Same enumeration, but score by number of non-zero digits cleared (with small random tie-breaking). Runs multiple attempts (default 3) with different random seeds and returns the best digit count. Typically better than area-priority; still no optimality guarantee.
Backtracking (solve_rectangles_backtracking): Recursive search over valid rectangles. At each state, generate all valid rectangles, sort by non-zero count descending, then recurse on each choice. Pruning: if digits_cleared + remaining_nonzero <= best_count, skip. Optional time limit; returns best solution found so far. Can find optimal solution but complexity is exponential; used for analysis, not live play.
BruteForceSolver enumerates all valid rectangles (non-zero sum = 10) once for the initial grid. Two search modes:
- Optimized (
solve_brute_force_optimized): Rectangles sorted by efficiency (non-zero count / area). Greedy solution used as lower bound for pruning. Iterates over combinations with early termination when remaining rectangles cannot beat best; best used for small or medium grids when optimality is needed. - Full brute force (
solve_brute_force): Unpruned enumeration of combinations; guaranteed optimal when run to completion but expensive.
solve_greedy is a simple greedy baseline used internally (e.g. as lower bound).
Environment: Custom Gymnasium RectangleEnv. State: 10x17 grid (shape (10, 17, 1)), values 0–9. Actions: discrete index into all possible rectangles (r1,c1,r2,c2) in row-major order; invalid actions (sum != 10 or all zeros) are masked. Reward: number of non-zero digits cleared per step; optional large bonus for full clear. Episodes run on fixed grids from disk (train/validation); env cycles through a list of grids.
Training: Stable-Baselines3 PPO. Policy: MLP (e.g. pi/vf [256,256] or [128,128] in fast mode). Action masking applied so the agent only selects valid rectangles. Training loads grids from data/train, validates on data/validation; supports MPS (Apple Silicon), CUDA, or CPU.
Inference: solve_rectangles_model builds a minimal env with the same action space and rectangle map, loads the saved PPO model, and runs the policy with masking until no valid action exists. Used by main.py for live solving.
Separate from the PPO pipeline. State: raw grid. Action: continuous (r1, c1, r2, c2) from a BoxActionHead (MLP outputting 4 coordinates, clamped to grid and r2>=r1, c2>=c1). Feature extractor: SpatialAttentionCNN – conv layers with spatial attention (1x1 conv + sigmoid), layer norm, then global pooling to a single vector. Training: experience replay; steps in a FruitBoxEnvironment that validates each box (sum must be 10, at least one non-zero) and rewards by digits cleared. Loss combines policy (e.g. continuous action loss) and value head; no discrete action space. BruteForceSolver (or mock) is used only to list valid rectangles for masking or termination, not for action space.
FruitBoxCNN: Encoder (conv stack + spatial attention) maps grid to latent vector; decoder reconstructs grid (e.g. for unsupervised pretraining). Strategy head: latent → 4-D (rectangle coordinates). Value head: latent → scalar. Can be trained with autoencoder loss on grids and/or supervised/RL objectives on moves. StrategyCNN (in same file) is a patch-based design with move history (LSTM/Transformer) for sequence-aware play. KMeans/TSNE used in scripts for latent analysis. Invoked via Makefile (train-cnn, test-cnn, etc.); not wired into main.py’s default solve path.
- Data split:
organize_data.pysplits collected grids intodata/train,data/validation,data/test(e.g. 80/10/10). Run before PPO training to avoid leakage. - PPO:
make train-fast(500k steps, saved grids),make train-long(2M),make train-fast-quick(100k, fast eval). Optional TensorBoard;--cpuforces CPU. - Continue training:
make continue-train MODEL=models/<name>.zip - GNN RL:
make gnn-train/gnn-train-fast;gnn-test/gnn-evalwith--load-modeland--test-grid - CNN:
make train-cnn,make test-cnn, etc. - Strategy CNN:
make strategy-train,make strategy-test
- Python 3.7+. Tesseract required for OCR method
ocr. Recommended: virtualenv. make venv-createthensource fruitbox_env/bin/activate; dependencies inrequirements.txt.- Run solver:
make run(uses default model and pixel OCR).make run-model MODEL=models/<name>.zip.make run-testfor a test image. Verbose:make run-verbose. - Screen/buttons: Set
region,screen_offset,RESET_LOC,PLAY_LOCinmain.pyfor your resolution and layout.
- OCR method:
main.pypassesmethodtoprocess_image_to_grid('template','ocr', or'pixel'). - PPO: Total steps, eval frequency, batch size, net architecture, and device in
training.py(and via CLI flags intraining.py). - Debug:
--verbose; debug images (e.g.preview_overlay.png,debug_steps/) written during OCR and solve.
- Template OCR: high accuracy when font matches templates; runtime on the order of milliseconds for a full board.
- Heuristic solvers: digit-priority greedy with retries often reaches ~130–140+ digits cleared; backtracking can hit optimal but may be slow; PPO models in the same range with faster inference.
- Brute-force: use optimized version with time or complexity limits for real use; full brute force only for small instances.
- Training: PPO uses MPS/CUDA when available; GNN/CNN scripts use PyTorch device selection.
MIT. See LICENSE.