PlanFM State-Centric Quantization

Abstract

This repository contains the quantization-focused portion of a state-centric neural planning study. The experimental object is a Weisfeiler-Lehman (WL, stored as graphs) state-prediction LSTM used as a search heuristic over classical planning domains. The repository isolates the code, encodings, PDDL tasks, checkpoints, and analysis outputs needed to evaluate float32, dynamic int8, and quantization-aware-trained int8 variants of the WL-state LSTM.

The extraction is intentionally narrow. It does not include unrelated tokenization experiments, raw trajectory generation, Plansformer assets, XGBoost baselines, temporary runs, or W&B logs. The included outputs support reproducing the quantization analysis, rerunning compact smoke tests, inspecting model-size/latency/accuracy tradeoffs, and extending the QAT experiment.

Research Question

The central question is whether state-centric planning heuristics based on recurrent neural state prediction can be compressed with int8 quantization while preserving planning utility. The study compares:

float32: the original WL-state LSTM checkpoint.
dynamic_int8: post-training dynamic quantization of LSTM and linear layers for CPU inference.
qat_int8: quantization-aware fine-tuning followed by int8 conversion.

The full paper-style run covers four IPC-style domains (blocks, gripper, logistics, and visitall-from-everywhere), three data splits (validation, test-interpolation, and test-extrapolation), and QAT seeds 13, 23, and 37.

Repository Layout

planfm-state-centric-quantization/
  code/         Python packages for models, QAT, inference, metrics, and figures
  data/         WL/graphs encodings, WL vocabularies, and PDDL tasks
  docs/         Research reports and experiment notes
  models/       Float WL-state LSTM checkpoints and QAT outputs
  notebooks/    Demonstration notebooks for inventory, evaluation, and planning smoke tests
  results/      Reused float baselines plus quantization study outputs
  scripts/      Convenience commands for smoke evaluation and report regeneration
  tests/        Focused quantization unit tests

Every directory has a local README.md describing its contents and role in the experiment.

Method Summary

Each trajectory is represented as a sequence of WL graph features saved under data/encodings/graphs/<domain>/<split>/. The model receives a current state vector and a goal vector, predicts the next state vector, and uses the predicted vector as a heuristic target during beam-search planning over grounded PDDL successors.

The QAT model keeps the input projection and prediction head in float while quantizing the recurrent core. This partial-quantization design preserves compatibility with the existing float checkpoint and avoids expanding the intervention to the LayerNorm-containing head. Dynamic quantization provides a lower-effort CPU baseline by quantizing supported LSTM and linear modules after training.

Setup

Create an environment with either uv or Conda.

uv venv
uv pip install -e ".[notebooks]"

conda env create -f environment.yml
conda activate planfm-state-centric-quantization

The WL tokenizer depends on wlplan, installed from its Git source in both setup paths. Planning validation with VAL is optional for smoke runs because the scripts support --skip_validation. Full executable-plan validation requires a VAL binary and a path passed via --val_path.

Quick Verification

Run the focused quantization tests:

python -m pytest tests/test_quantization.py -q

Evaluate model-size and prediction metrics for one domain:

python -m code.quantization.evaluate_quantization \
  --domain blocks \
  --data_dir data/encodings/graphs \
  --float_checkpoint models/float_wl_lstm_state/blocks_lstm_best.pt \
  --qat_checkpoint models/qat_wl_lstm_state/seed_13/blocks/blocks_qat_int8_best.pt \
  --output_path outputs/blocks_quantization_eval.json

Run a small planning smoke test without VAL:

python -m code.quantization.inference_qat_lstm \
  --domain blocks \
  --approach dynamic_int8 \
  --checkpoint models/float_wl_lstm_state/blocks_lstm_best.pt \
  --encoding graphs \
  --encoding_data_dir data/encodings/graphs \
  --results_dir outputs/smoke_dynamic_blocks \
  --splits validation \
  --max_problems 2 \
  --skip_validation \
  --val_path unused

Rebuilding Analysis Outputs

The included manifest is results/quantization_wl_state_paper/multiseed_manifest.json. Regenerate aggregate tables and the markdown report from existing per-problem outputs with:

python -m code.quantization.analyze_wl_lstm_quantization_multiseed \
  --manifest results/quantization_wl_state_paper/multiseed_manifest.json \
  --output_root results/quantization_wl_state_paper/analysis \
  --report_path docs/QUANTIZATION_WL_LSTM_PAPER_REPORT.md

Regenerate journal-style figures with:

python -m code.quantization.make_journal_figures

The full end-to-end study runner is available as code.quantization.run_full_wl_lstm_quantization_study. It trains QAT checkpoints, runs inference shards, builds a manifest, and regenerates analysis. A VAL path is required for strict validation.

Demonstration Notebooks

notebooks/01_inventory_and_manifest.ipynb inspects the repository inventory, domains, splits, manifests, and output counts.
notebooks/02_quantization_evaluation_demo.ipynb runs a compact model-comparison evaluation and reads the generated metrics.
notebooks/03_planning_smoke_test.ipynb executes a small WL-state planning run with --skip_validation and inspects the produced results.

Included Outputs

The repository includes:

WL/graphs encodings for all study domains and splits.
WL feature-generator vocabularies under data/encodings/models.
PDDL domain/problem files needed by planner inference.
Float WL-state LSTM checkpoints and training logs.
QAT prepared and converted int8 checkpoints for three seeds.
Per-problem dynamic-int8, QAT-int8, and reused/rerun float32 planning outputs.
Aggregate CSV/JSON analysis tables and publication-style figures.

Notes on Reproducibility

The study scripts set deterministic seeds where practical. CPU quantization uses PyTorch eager-mode quantization with the onednn backend by default. Full reproducibility still depends on the installed PyTorch version, quantized backend support on the host CPU, and whether VAL validation is enabled. The included result outputs provide a fixed reference point for downstream analysis even when fresh runtimes vary slightly across machines.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PlanFM State-Centric Quantization

Abstract

Research Question

Repository Layout

Method Summary

Setup

Quick Verification

Rebuilding Analysis Outputs

Demonstration Notebooks

Included Outputs

Notes on Reproducibility

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
code		code
data		data
docs		docs
models		models
notebooks		notebooks
results		results
scripts		scripts
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

PlanFM State-Centric Quantization

Abstract

Research Question

Repository Layout

Method Summary

Setup

Quick Verification

Rebuilding Analysis Outputs

Demonstration Notebooks

Included Outputs

Notes on Reproducibility

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages