This project implements PDDL-INSTRUCT with Logical Chain-of-Thought (LCoT), a novel approach to improve Large Language Model (LLM) performance on automated planning tasks. The system enhances planning capabilities through:
- Structured Reasoning: Logical chain-of-thought prompting with stateβactionβstate traces
- External Validation: Integration with VAL (PDDL validator) for plan verification and detailed feedback
- Two-Stage Training: Progressive optimization from reasoning consistency to plan accuracy
- Error-Aware Learning: Sophisticated error diagnosis and refinement mechanisms
Traditional LLMs struggle with complex planning tasks, achieving <50% validity on domains like Blocksworld. This project addresses fundamental limitations:
- Logical Consistency: Ensures state transitions follow PDDL semantics
- Error Understanding: Maps validator feedback to natural language explanations
- Systematic Improvement: Structured training pipeline for consistent gains
- Domain Transfer: Robust performance across different planning domains
%%{init: {'theme':'dark', 'themeVariables': { 'primaryColor': '#1f2937', 'primaryTextColor': '#f9fafb', 'primaryBorderColor': '#374151', 'lineColor': '#6b7280', 'secondaryColor': '#374151', 'tertiaryColor': '#111827', 'background': '#111827', 'mainBkg': '#1f2937', 'secondBkg': '#374151', 'tertiaryBkg': '#111827'}}}%%
flowchart TD
A[PDDL Domain & Problem] --> B[Prompt Templates]
B --> C[LLM with LoRA]
C --> D[Logical CoT Generation]
D --> E{Plan Generated}
E --> F[VAL Validator]
F --> G{Valid Plan?}
G -->|Yes| H[Success]
G -->|No| I[Feedback Parser]
I --> J[Error Analysis]
J --> K[Refinement Prompt]
K --> C
style A fill:#1f2937,stroke:#6b7280,color:#f9fafb
style B fill:#1f2937,stroke:#6b7280,color:#f9fafb
style C fill:#1f2937,stroke:#6b7280,color:#f9fafb
style D fill:#1f2937,stroke:#6b7280,color:#f9fafb
style E fill:#1f2937,stroke:#6b7280,color:#f9fafb
style F fill:#1f2937,stroke:#6b7280,color:#f9fafb
style G fill:#1f2937,stroke:#6b7280,color:#f9fafb
style H fill:#059669,stroke:#6b7280,color:#f9fafb
style I fill:#1f2937,stroke:#6b7280,color:#f9fafb
style J fill:#1f2937,stroke:#6b7280,color:#f9fafb
style K fill:#1f2937,stroke:#6b7280,color:#f9fafb
| Technology | Purpose | Why Chosen |
|---|---|---|
| Python 3.11 | Primary Language | Latest stable features, type hints, performance improvements |
| PyTorch | Deep Learning Framework | Mature ecosystem, CUDA support, research-friendly |
| Transformers/vLLM | LLM Infrastructure | HuggingFace integration, optimized inference |
| PEFT/LoRA | Parameter-Efficient Training | Memory-efficient fine-tuning for limited hardware |
| VAL | PDDL Plan Validator | Industry-standard validator with detailed feedback |
| Hydra | Configuration Management | Hierarchical configs, experiment reproducibility |
%%{init: {'theme':'dark', 'themeVariables': { 'primaryColor': '#1f2937', 'primaryTextColor': '#f9fafb', 'primaryBorderColor': '#374151', 'lineColor': '#6b7280', 'secondaryColor': '#374151', 'tertiaryColor': '#111827', 'background': '#111827', 'mainBkg': '#1f2937', 'secondBkg': '#374151', 'tertiaryBkg': '#111827'}}}%%
graph LR
A[Stage 1: Reasoning] --> B[State Transition Loss]
A --> C[CoT Format Loss]
A --> D[KL Regularization]
E[Stage 2: Planning] --> F[Plan Validity Loss]
E --> G[Teacher Forcing]
E --> H[Refinement Loop]
B --> I[Combined Loss]
C --> I
D --> I
F --> J[Planning Loss]
G --> J
H --> J
I --> K[Stage 1 Model]
J --> L[Stage 2 Model]
K --> E
style A fill:#1f2937,stroke:#6b7280,color:#f9fafb
style B fill:#1f2937,stroke:#6b7280,color:#f9fafb
style C fill:#1f2937,stroke:#6b7280,color:#f9fafb
style D fill:#1f2937,stroke:#6b7280,color:#f9fafb
style E fill:#1f2937,stroke:#6b7280,color:#f9fafb
style F fill:#1f2937,stroke:#6b7280,color:#f9fafb
style G fill:#1f2937,stroke:#6b7280,color:#f9fafb
style H fill:#1f2937,stroke:#6b7280,color:#f9fafb
style I fill:#374151,stroke:#6b7280,color:#f9fafb
style J fill:#374151,stroke:#6b7280,color:#f9fafb
style K fill:#059669,stroke:#6b7280,color:#f9fafb
style L fill:#059669,stroke:#6b7280,color:#f9fafb
| Component | Technique | Memory Savings |
|---|---|---|
| Base Model | 4-bit Quantization (NF4) | ~75% reduction |
| Training | LoRA (r=16, Ξ±=32) | ~90% trainable params |
| Gradients | Gradient Checkpointing | ~50% activation memory |
| Attention | Flash Attention | ~40% attention memory |
| Batching | Gradient Accumulation | Larger effective batch sizes |
- Python 3.11+
- CUDA-capable GPU (recommended: 2x RTX 3080 or equivalent)
- Docker (optional but recommended)
- Clone the repository
git clone https://github.com/hkevin01/pddl-instruct-lcot.git
cd pddl-instruct-lcot- Install dependencies
# Using pip
pip install -r requirements.txt
pip install -e .[dev,test]
# Or using Docker
docker-compose up --build- Setup VAL validator
bash scripts/setup_val.sh- Download datasets
bash scripts/download_planbench.sh# Stage 1: Reasoning optimization
python scripts/train_stage1.py --config-path configs --config-name train_stage1
# Stage 2: Planning optimization
python scripts/train_stage2.py --config-path configs --config-name train_stage2# No-feedback evaluation (final test)
python scripts/evaluate.py --config-path configs --config-name eval/no_feedback
# Interactive inference
python scripts/run_inference.py \
--domain data/raw/blocksworld/domain.pddl \
--problem data/raw/blocksworld/p01.pddl \
--no-feedback| Domain | Baseline | Target | Improvement |
|---|---|---|---|
| Blocksworld | <50% | β₯90% | +40%+ absolute |
| Logistics | ~30% | β₯85% | +55%+ absolute |
| Mystery Blocksworld | ~1% | β₯60% | 60Γ relative |
%%{init: {'theme':'dark', 'themeVariables': { 'primaryColor': '#1f2937', 'primaryTextColor': '#f9fafb', 'primaryBorderColor': '#374151', 'lineColor': '#6b7280', 'secondaryColor': '#374151', 'tertiaryColor': '#111827', 'background': '#111827', 'mainBkg': '#1f2937', 'secondBkg': '#374151', 'tertiaryBkg': '#111827'}}}%%
sequenceDiagram
participant M as Model
participant V as VAL Validator
participant F as Feedback Parser
participant R as Refinement
M->>V: Generated Plan
V->>F: Validation Output
F->>F: Parse Errors
alt Plan Valid
F->>M: Success Signal
else Plan Invalid
F->>R: Structured Feedback
R->>M: Refinement Prompt
M->>V: Revised Plan
end
rect rgb(31, 41, 55)
note over M,R: Training Mode Only
end
pddl-instruct-lcot/
βββ π src/pddl_instruct_lcot/ # Core library code
β βββ π data/ # Data loading and processing
β βββ π modeling/ # Model and tokenizer utilities
β βββ π planning/ # PDDL parsing and validation
β βββ π training/ # Training pipelines
β βββ π eval/ # Evaluation metrics
β βββ π prompts/ # Prompt templates
βββ π scripts/ # Executable scripts
βββ π configs/ # Hydra configuration files
βββ π data/ # Dataset storage
β βββ π raw/ # Original PDDL files
β βββ π processed/ # Instruction-tuning datasets
βββ π tests/ # Test suites
βββ π docs/ # Documentation
βββ π docker/ # Containerization
Traditional approaches rely on end-to-end generation. Our LCoT format ensures:
Step i:
Preconditions: [check each required condition]
Apply Effects: add {...}, delete {...}
Next State: [canonical fact list]
Instead of binary valid/invalid, we extract:
- Unsatisfied preconditions with specific predicates
- Incorrect add/delete effects with step indices
- Frame violations and invariant breaches
- Goal achievement status with missing facts
- Stage 1: Focus on logical consistency and state transitions
- Stage 2: Optimize for end-task planning accuracy with refinement
Designed for dual RTX 3080 setup (20GB total):
- 4-bit quantized base model (~4GB)
- LoRA adapters (~500MB)
- Gradient checkpointing
- Batch accumulation strategies
| Metric | Description | Target |
|---|---|---|
| Plan Validity | Percentage of VAL-valid plans | β₯90% |
| Goal Achievement | Plans reaching stated goals | β₯95% |
| Plan Efficiency | Average steps vs optimal | <120% |
| Error Reduction | Decrease in common error types | >75% |
| Domain Transfer | Performance across domains | Consistent |
# configs/model/llama3_8b.yaml
model_name: meta-llama/Meta-Llama-3-8B-Instruct
load_in_4bit: true
bnb_4bit_quant_type: nf4
lora_r: 16
lora_alpha: 32
lora_dropout: 0.05
gradient_checkpointing: true
max_seq_len: 4096# configs/train_stage1.yaml
epochs: 2
learning_rate: 1e-4
loss_weights:
state_transition: 0.6
cot_format: 0.2
kl_regularization: 0.2
feedback_mode: detailed
refinement_budget: 2- Unit Tests: Core functionality validation
- Integration Tests: End-to-end pipeline testing
- Regression Tests: Performance benchmark tracking
- Hardware Tests: Multi-GPU training validation
- Automated linting (Black, isort, flake8)
- Type checking (mypy)
- Test execution (pytest)
- Docker build validation
- Performance benchmarking
- Project Plan: Detailed development roadmap
- API Reference: Code documentation
- Tutorials: Step-by-step guides
- Contributing: Development guidelines
We welcome contributions! Please see our Contributing Guidelines for details on:
- Code style and standards
- Testing requirements
- Pull request process
- Issue reporting
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
- PlanBench: Dataset and evaluation framework
- VAL: PDDL validation toolkit
- HuggingFace: Transformers and model ecosystem
- Meta: Llama model family
- Issues: Use GitHub Issues for bug reports and feature requests
- Discussions: Use GitHub Discussions for questions and ideas
- Documentation: Check our comprehensive docs for detailed guides