This repository contains the quantization-focused portion of a state-centric neural planning study. The experimental object is a Weisfeiler-Lehman (WL, stored as graphs) state-prediction LSTM used as a search heuristic over classical planning domains. The repository isolates the code, encodings, PDDL tasks, checkpoints, and analysis outputs needed to evaluate float32, dynamic int8, and quantization-aware-trained int8 variants of the WL-state LSTM.
The extraction is intentionally narrow. It does not include unrelated tokenization experiments, raw trajectory generation, Plansformer assets, XGBoost baselines, temporary runs, or W&B logs. The included outputs support reproducing the quantization analysis, rerunning compact smoke tests, inspecting model-size/latency/accuracy tradeoffs, and extending the QAT experiment.
The central question is whether state-centric planning heuristics based on recurrent neural state prediction can be compressed with int8 quantization while preserving planning utility. The study compares:
float32: the original WL-state LSTM checkpoint.dynamic_int8: post-training dynamic quantization of LSTM and linear layers for CPU inference.qat_int8: quantization-aware fine-tuning followed by int8 conversion.
The full paper-style run covers four IPC-style domains (blocks, gripper, logistics, and visitall-from-everywhere), three data splits (validation, test-interpolation, and test-extrapolation), and QAT seeds 13, 23, and 37.
planfm-state-centric-quantization/
code/ Python packages for models, QAT, inference, metrics, and figures
data/ WL/graphs encodings, WL vocabularies, and PDDL tasks
docs/ Research reports and experiment notes
models/ Float WL-state LSTM checkpoints and QAT outputs
notebooks/ Demonstration notebooks for inventory, evaluation, and planning smoke tests
results/ Reused float baselines plus quantization study outputs
scripts/ Convenience commands for smoke evaluation and report regeneration
tests/ Focused quantization unit tests
Every directory has a local README.md describing its contents and role in the experiment.
Each trajectory is represented as a sequence of WL graph features saved under data/encodings/graphs/<domain>/<split>/. The model receives a current state vector and a goal vector, predicts the next state vector, and uses the predicted vector as a heuristic target during beam-search planning over grounded PDDL successors.
The QAT model keeps the input projection and prediction head in float while quantizing the recurrent core. This partial-quantization design preserves compatibility with the existing float checkpoint and avoids expanding the intervention to the LayerNorm-containing head. Dynamic quantization provides a lower-effort CPU baseline by quantizing supported LSTM and linear modules after training.
Create an environment with either uv or Conda.
uv venv
uv pip install -e ".[notebooks]"conda env create -f environment.yml
conda activate planfm-state-centric-quantizationThe WL tokenizer depends on wlplan, installed from its Git source in both setup paths. Planning validation with VAL is optional for smoke runs because the scripts support --skip_validation. Full executable-plan validation requires a VAL binary and a path passed via --val_path.
Run the focused quantization tests:
python -m pytest tests/test_quantization.py -qEvaluate model-size and prediction metrics for one domain:
python -m code.quantization.evaluate_quantization \
--domain blocks \
--data_dir data/encodings/graphs \
--float_checkpoint models/float_wl_lstm_state/blocks_lstm_best.pt \
--qat_checkpoint models/qat_wl_lstm_state/seed_13/blocks/blocks_qat_int8_best.pt \
--output_path outputs/blocks_quantization_eval.jsonRun a small planning smoke test without VAL:
python -m code.quantization.inference_qat_lstm \
--domain blocks \
--approach dynamic_int8 \
--checkpoint models/float_wl_lstm_state/blocks_lstm_best.pt \
--encoding graphs \
--encoding_data_dir data/encodings/graphs \
--results_dir outputs/smoke_dynamic_blocks \
--splits validation \
--max_problems 2 \
--skip_validation \
--val_path unusedThe included manifest is results/quantization_wl_state_paper/multiseed_manifest.json. Regenerate aggregate tables and the markdown report from existing per-problem outputs with:
python -m code.quantization.analyze_wl_lstm_quantization_multiseed \
--manifest results/quantization_wl_state_paper/multiseed_manifest.json \
--output_root results/quantization_wl_state_paper/analysis \
--report_path docs/QUANTIZATION_WL_LSTM_PAPER_REPORT.mdRegenerate journal-style figures with:
python -m code.quantization.make_journal_figuresThe full end-to-end study runner is available as code.quantization.run_full_wl_lstm_quantization_study. It trains QAT checkpoints, runs inference shards, builds a manifest, and regenerates analysis. A VAL path is required for strict validation.
notebooks/01_inventory_and_manifest.ipynbinspects the repository inventory, domains, splits, manifests, and output counts.notebooks/02_quantization_evaluation_demo.ipynbruns a compact model-comparison evaluation and reads the generated metrics.notebooks/03_planning_smoke_test.ipynbexecutes a small WL-state planning run with--skip_validationand inspects the produced results.
The repository includes:
- WL/graphs encodings for all study domains and splits.
- WL feature-generator vocabularies under
data/encodings/models. - PDDL domain/problem files needed by planner inference.
- Float WL-state LSTM checkpoints and training logs.
- QAT prepared and converted int8 checkpoints for three seeds.
- Per-problem dynamic-int8, QAT-int8, and reused/rerun float32 planning outputs.
- Aggregate CSV/JSON analysis tables and publication-style figures.
The study scripts set deterministic seeds where practical. CPU quantization uses PyTorch eager-mode quantization with the onednn backend by default. Full reproducibility still depends on the installed PyTorch version, quantized backend support on the host CPU, and whether VAL validation is enabled. The included result outputs provide a fixed reference point for downstream analysis even when fresh runtimes vary slightly across machines.