FruitBox

Automation for grid-based puzzles where rectangles of digits must sum to 10 (e.g. FruitBox). Pipeline: screen capture, digit recognition (OCR), solving (heuristic or learned), and automated clicking. Grid is 10x17; digits 1-9; goal is to clear as many cells as possible by repeatedly selecting rectangles whose non-zero cells sum to 10.

Architecture

main.py – Entry point: capture, OCR, solve, click; performance logging per model.
ocr_grid.py – Grid detection and digit recognition (template, OCR, or pixel methods).
solver.py – Heuristic and PPO-based solvers; interface used by main.
clicker.py – Screen coordinate mapping and mouse automation.
training.py – Gymnasium environment and PPO training on saved grids.
brute_force_solver.py – Exhaustive and optimized brute-force search.
gnn_rl_solver.py – CNN-based RL agent with continuous box output and experience replay.
cnn_solver.py – CNN autoencoder + strategy/value heads; optional unsupervised/strategy training.
strategy_cnn.py – Strategy CNN trained on move sequences.
evaluate_model.py – Evaluation on test grids.
organize_data.py – Train/validation/test split.
list_performance.py – Performance aggregation and plots.
utils.py, eval_comparison.py, training_data.py – Shared utilities and data handling.

OCR

Digit recognition runs after grid detection and cell extraction. Three methods:

template – Pre-loaded digit templates (digits/); each cell matched via cv2.matchTemplate with TM_SQDIFF_NORMED; best match chosen. Fast, assumes consistent font.
ocr – Tesseract with --psm 10, digit whitelist. No templates; useful when font varies.
pixel – No raw template matching; normalized pixel difference plus SSIM vs templates; combined score picks digit. More robust to lighting/small deformations.

Grid and cell geometry are fixed by the game layout; coordinates and cell bounds are configured in main.py (region, offsets, button locations).

Solving algorithms

Heuristic solvers (solver.py)

Area-priority greedy (find_rectangles_zero_inclusive_greedy): Each pass, enumerate all valid rectangles (non-zero sum = 10, at least one non-zero). Sort by area descending; take the largest valid one, zero it out, repeat. No backtracking; linear number of passes; subsecond. Quality good but not optimal.

Digit-priority greedy (find_rectangles_digit_priority_greedy): Same enumeration, but score by number of non-zero digits cleared (with small random tie-breaking). Runs multiple attempts (default 3) with different random seeds and returns the best digit count. Typically better than area-priority; still no optimality guarantee.

Backtracking (solve_rectangles_backtracking): Recursive search over valid rectangles. At each state, generate all valid rectangles, sort by non-zero count descending, then recurse on each choice. Pruning: if digits_cleared + remaining_nonzero <= best_count, skip. Optional time limit; returns best solution found so far. Can find optimal solution but complexity is exponential; used for analysis, not live play.

Brute-force solver (brute_force_solver.py)

BruteForceSolver enumerates all valid rectangles (non-zero sum = 10) once for the initial grid. Two search modes:

Optimized (solve_brute_force_optimized): Rectangles sorted by efficiency (non-zero count / area). Greedy solution used as lower bound for pruning. Iterates over combinations with early termination when remaining rectangles cannot beat best; best used for small or medium grids when optimality is needed.
Full brute force (solve_brute_force): Unpruned enumeration of combinations; guaranteed optimal when run to completion but expensive.

solve_greedy is a simple greedy baseline used internally (e.g. as lower bound).

PPO (training.py + solver.solve_rectangles_model)

Environment: Custom Gymnasium RectangleEnv. State: 10x17 grid (shape (10, 17, 1)), values 0–9. Actions: discrete index into all possible rectangles (r1,c1,r2,c2) in row-major order; invalid actions (sum != 10 or all zeros) are masked. Reward: number of non-zero digits cleared per step; optional large bonus for full clear. Episodes run on fixed grids from disk (train/validation); env cycles through a list of grids.

Training: Stable-Baselines3 PPO. Policy: MLP (e.g. pi/vf [256,256] or [128,128] in fast mode). Action masking applied so the agent only selects valid rectangles. Training loads grids from data/train, validates on data/validation; supports MPS (Apple Silicon), CUDA, or CPU.

Inference: solve_rectangles_model builds a minimal env with the same action space and rectangle map, loads the saved PPO model, and runs the policy with masking until no valid action exists. Used by main.py for live solving.

GNN/CNN RL (gnn_rl_solver.py)

Separate from the PPO pipeline. State: raw grid. Action: continuous (r1, c1, r2, c2) from a BoxActionHead (MLP outputting 4 coordinates, clamped to grid and r2>=r1, c2>=c1). Feature extractor: SpatialAttentionCNN – conv layers with spatial attention (1x1 conv + sigmoid), layer norm, then global pooling to a single vector. Training: experience replay; steps in a FruitBoxEnvironment that validates each box (sum must be 10, at least one non-zero) and rewards by digits cleared. Loss combines policy (e.g. continuous action loss) and value head; no discrete action space. BruteForceSolver (or mock) is used only to list valid rectangles for masking or termination, not for action space.

CNN solver (cnn_solver.py)

FruitBoxCNN: Encoder (conv stack + spatial attention) maps grid to latent vector; decoder reconstructs grid (e.g. for unsupervised pretraining). Strategy head: latent → 4-D (rectangle coordinates). Value head: latent → scalar. Can be trained with autoencoder loss on grids and/or supervised/RL objectives on moves. StrategyCNN (in same file) is a patch-based design with move history (LSTM/Transformer) for sequence-aware play. KMeans/TSNE used in scripts for latent analysis. Invoked via Makefile (train-cnn, test-cnn, etc.); not wired into main.py’s default solve path.

Data and training commands

Data split: organize_data.py splits collected grids into data/train, data/validation, data/test (e.g. 80/10/10). Run before PPO training to avoid leakage.
PPO: make train-fast (500k steps, saved grids), make train-long (2M), make train-fast-quick (100k, fast eval). Optional TensorBoard; --cpu forces CPU.
Continue training: make continue-train MODEL=models/<name>.zip
GNN RL: make gnn-train / gnn-train-fast; gnn-test / gnn-eval with --load-model and --test-grid
CNN: make train-cnn, make test-cnn, etc.
Strategy CNN: make strategy-train, make strategy-test

Installation and usage

Python 3.7+. Tesseract required for OCR method ocr. Recommended: virtualenv.
make venv-create then source fruitbox_env/bin/activate; dependencies in requirements.txt.
Run solver: make run (uses default model and pixel OCR). make run-model MODEL=models/<name>.zip. make run-test for a test image. Verbose: make run-verbose.
Screen/buttons: Set region, screen_offset, RESET_LOC, PLAY_LOC in main.py for your resolution and layout.

Configuration

OCR method: main.py passes method to process_image_to_grid ('template', 'ocr', or 'pixel').
PPO: Total steps, eval frequency, batch size, net architecture, and device in training.py (and via CLI flags in training.py).
Debug: --verbose; debug images (e.g. preview_overlay.png, debug_steps/) written during OCR and solve.

Performance notes

Template OCR: high accuracy when font matches templates; runtime on the order of milliseconds for a full board.
Heuristic solvers: digit-priority greedy with retries often reaches ~130–140+ digits cleared; backtracking can hit optimal but may be slow; PPO models in the same range with faster inference.
Brute-force: use optimized version with time or complexity limits for real use; full brute force only for small instances.
Training: PPO uses MPS/CUDA when available; GNN/CNN scripts use PyTorch device selection.

License

MIT. See LICENSE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FruitBox

Architecture

OCR

Solving algorithms

Heuristic solvers (solver.py)

Brute-force solver (brute_force_solver.py)

PPO (training.py + solver.solve_rectangles_model)

GNN/CNN RL (gnn_rl_solver.py)

CNN solver (cnn_solver.py)

Data and training commands

Installation and usage

Configuration

Performance notes

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.vscode		.vscode
__pycache__		__pycache__
data		data
digits		digits
fruitbox_env		fruitbox_env
logs		logs
models		models
plots		plots
screenshots		screenshots
test_data		test_data
training_data		training_data
training_grids		training_grids
.gitignore		.gitignore
Makefile		Makefile
ReadMe.md		ReadMe.md
clicker.py		clicker.py
eval_comparison.py		eval_comparison.py
evaluate_model.py		evaluate_model.py
gemini.py		gemini.py
list_performance.py		list_performance.py
main.py		main.py
ocr_grid.py		ocr_grid.py
organize_data.py		organize_data.py
performance_data_model_20250626_102548.json		performance_data_model_20250626_102548.json
performance_plot_model_20250626_102548.png		performance_plot_model_20250626_102548.png
preview_overlay.png		preview_overlay.png
requirements.txt		requirements.txt
solver.py		solver.py
training.py		training.py
training_backup.py		training_backup.py
training_data.py		training_data.py
utils.py		utils.py

Folders and files

Latest commit

History

Repository files navigation

FruitBox

Architecture

OCR

Solving algorithms

Heuristic solvers (solver.py)

Brute-force solver (brute_force_solver.py)

PPO (training.py + solver.solve_rectangles_model)

GNN/CNN RL (gnn_rl_solver.py)

CNN solver (cnn_solver.py)

Data and training commands

Installation and usage

Configuration

Performance notes

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages