PDDL-INSTRUCT with Logical Chain-of-Thought (LCoT)

🎯 Project Purpose

This project implements PDDL-INSTRUCT with Logical Chain-of-Thought (LCoT), a novel approach to improve Large Language Model (LLM) performance on automated planning tasks. The system enhances planning capabilities through:

Structured Reasoning: Logical chain-of-thought prompting with state→action→state traces
External Validation: Integration with VAL (PDDL validator) for plan verification and detailed feedback
Two-Stage Training: Progressive optimization from reasoning consistency to plan accuracy
Error-Aware Learning: Sophisticated error diagnosis and refinement mechanisms

Why This Matters

Traditional LLMs struggle with complex planning tasks, achieving <50% validity on domains like Blocksworld. This project addresses fundamental limitations:

Logical Consistency: Ensures state transitions follow PDDL semantics
Error Understanding: Maps validator feedback to natural language explanations
Systematic Improvement: Structured training pipeline for consistent gains
Domain Transfer: Robust performance across different planning domains

🏗️ System Architecture

%%{init: {'theme':'dark', 'themeVariables': { 'primaryColor': '#1f2937', 'primaryTextColor': '#f9fafb', 'primaryBorderColor': '#374151', 'lineColor': '#6b7280', 'secondaryColor': '#374151', 'tertiaryColor': '#111827', 'background': '#111827', 'mainBkg': '#1f2937', 'secondBkg': '#374151', 'tertiaryBkg': '#111827'}}}%%
flowchart TD
    A[PDDL Domain & Problem] --> B[Prompt Templates]
    B --> C[LLM with LoRA]
    C --> D[Logical CoT Generation]
    D --> E{Plan Generated}
    E --> F[VAL Validator]
    F --> G{Valid Plan?}
    G -->|Yes| H[Success]
    G -->|No| I[Feedback Parser]
    I --> J[Error Analysis]
    J --> K[Refinement Prompt]
    K --> C
    
    style A fill:#1f2937,stroke:#6b7280,color:#f9fafb
    style B fill:#1f2937,stroke:#6b7280,color:#f9fafb
    style C fill:#1f2937,stroke:#6b7280,color:#f9fafb
    style D fill:#1f2937,stroke:#6b7280,color:#f9fafb
    style E fill:#1f2937,stroke:#6b7280,color:#f9fafb
    style F fill:#1f2937,stroke:#6b7280,color:#f9fafb
    style G fill:#1f2937,stroke:#6b7280,color:#f9fafb
    style H fill:#059669,stroke:#6b7280,color:#f9fafb
    style I fill:#1f2937,stroke:#6b7280,color:#f9fafb
    style J fill:#1f2937,stroke:#6b7280,color:#f9fafb
    style K fill:#1f2937,stroke:#6b7280,color:#f9fafb

🔧 Technical Stack & Rationale

Core Technologies

_Technology	_Purpose	_{Why Chosen}
_{Python 3.11}	_{Primary Language}	_{Latest stable features, type hints, performance improvements}
_PyTorch	_{Deep Learning Framework}	_{Mature ecosystem, CUDA support, research-friendly}
_{Transformers/vLLM}	_{LLM Infrastructure}	_{HuggingFace integration, optimized inference}
_PEFT/LoRA	_{Parameter-Efficient Training}	_{Memory-efficient fine-tuning for limited hardware}
_VAL	_{PDDL Plan Validator}	_{Industry-standard validator with detailed feedback}
_Hydra	_{Configuration Management}	_{Hierarchical configs, experiment reproducibility}

Training Pipeline

%%{init: {'theme':'dark', 'themeVariables': { 'primaryColor': '#1f2937', 'primaryTextColor': '#f9fafb', 'primaryBorderColor': '#374151', 'lineColor': '#6b7280', 'secondaryColor': '#374151', 'tertiaryColor': '#111827', 'background': '#111827', 'mainBkg': '#1f2937', 'secondBkg': '#374151', 'tertiaryBkg': '#111827'}}}%%
graph LR
    A[Stage 1: Reasoning] --> B[State Transition Loss]
    A --> C[CoT Format Loss]
    A --> D[KL Regularization]
    
    E[Stage 2: Planning] --> F[Plan Validity Loss]
    E --> G[Teacher Forcing]
    E --> H[Refinement Loop]
    
    B --> I[Combined Loss]
    C --> I
    D --> I
    F --> J[Planning Loss]
    G --> J
    H --> J
    
    I --> K[Stage 1 Model]
    J --> L[Stage 2 Model]
    K --> E
    
    style A fill:#1f2937,stroke:#6b7280,color:#f9fafb
    style B fill:#1f2937,stroke:#6b7280,color:#f9fafb
    style C fill:#1f2937,stroke:#6b7280,color:#f9fafb
    style D fill:#1f2937,stroke:#6b7280,color:#f9fafb
    style E fill:#1f2937,stroke:#6b7280,color:#f9fafb
    style F fill:#1f2937,stroke:#6b7280,color:#f9fafb
    style G fill:#1f2937,stroke:#6b7280,color:#f9fafb
    style H fill:#1f2937,stroke:#6b7280,color:#f9fafb
    style I fill:#374151,stroke:#6b7280,color:#f9fafb
    style J fill:#374151,stroke:#6b7280,color:#f9fafb
    style K fill:#059669,stroke:#6b7280,color:#f9fafb
    style L fill:#059669,stroke:#6b7280,color:#f9fafb

Memory Optimization Strategy

_Component	_Technique	_{Memory Savings}
_{Base Model}	_{4-bit Quantization (NF4)}	_{~75% reduction}
_Training	_{LoRA (r=16, α=32)}	_{~90% trainable params}
_Gradients	_{Gradient Checkpointing}	_{~50% activation memory}
_Attention	_{Flash Attention}	_{~40% attention memory}
_Batching	_{Gradient Accumulation}	_{Larger effective batch sizes}

🚀 Quick Start

Prerequisites

Python 3.11+
CUDA-capable GPU (recommended: 2x RTX 3080 or equivalent)
Docker (optional but recommended)

Installation

Clone the repository

git clone https://github.com/hkevin01/pddl-instruct-lcot.git
cd pddl-instruct-lcot

Install dependencies

# Using pip
pip install -r requirements.txt
pip install -e .[dev,test]

# Or using Docker
docker-compose up --build

Setup VAL validator

bash scripts/setup_val.sh

Download datasets

bash scripts/download_planbench.sh

Training

# Stage 1: Reasoning optimization
python scripts/train_stage1.py --config-path configs --config-name train_stage1

# Stage 2: Planning optimization  
python scripts/train_stage2.py --config-path configs --config-name train_stage2

Evaluation

# No-feedback evaluation (final test)
python scripts/evaluate.py --config-path configs --config-name eval/no_feedback

# Interactive inference
python scripts/run_inference.py \
  --domain data/raw/blocksworld/domain.pddl \
  --problem data/raw/blocksworld/p01.pddl \
  --no-feedback

📊 Performance Targets

Expected Results

_Domain	_Baseline	_Target	_Improvement
_Blocksworld	_<50%	_≥90%	_{+40%+ absolute}
_Logistics	_~30%	_≥85%	_{+55%+ absolute}
_{Mystery Blocksworld}	_~1%	_≥60%	_{60× relative}

Validation Pipeline

%%{init: {'theme':'dark', 'themeVariables': { 'primaryColor': '#1f2937', 'primaryTextColor': '#f9fafb', 'primaryBorderColor': '#374151', 'lineColor': '#6b7280', 'secondaryColor': '#374151', 'tertiaryColor': '#111827', 'background': '#111827', 'mainBkg': '#1f2937', 'secondBkg': '#374151', 'tertiaryBkg': '#111827'}}}%%
sequenceDiagram
    participant M as Model
    participant V as VAL Validator
    participant F as Feedback Parser
    participant R as Refinement

    M->>V: Generated Plan
    V->>F: Validation Output
    F->>F: Parse Errors
    alt Plan Valid
        F->>M: Success Signal
    else Plan Invalid
        F->>R: Structured Feedback
        R->>M: Refinement Prompt
        M->>V: Revised Plan
    end
    
    rect rgb(31, 41, 55)
        note over M,R: Training Mode Only
    end

🏗️ Project Structure

pddl-instruct-lcot/
├── 📁 src/pddl_instruct_lcot/     # Core library code
│   ├── 📁 data/                   # Data loading and processing
│   ├── 📁 modeling/               # Model and tokenizer utilities
│   ├── 📁 planning/               # PDDL parsing and validation
│   ├── 📁 training/               # Training pipelines
│   ├── 📁 eval/                   # Evaluation metrics
│   └── 📁 prompts/                # Prompt templates
├── 📁 scripts/                    # Executable scripts
├── 📁 configs/                    # Hydra configuration files
├── 📁 data/                       # Dataset storage
│   ├── 📁 raw/                    # Original PDDL files
│   └── 📁 processed/              # Instruction-tuning datasets
├── 📁 tests/                      # Test suites
├── 📁 docs/                       # Documentation
└── 📁 docker/                     # Containerization

🔬 Key Innovations

1. Logical Chain-of-Thought Prompting

Traditional approaches rely on end-to-end generation. Our LCoT format ensures:

Step i:
  Preconditions: [check each required condition]
  Apply Effects: add {...}, delete {...}
  Next State: [canonical fact list]

2. Detailed Validator Feedback

Instead of binary valid/invalid, we extract:

Unsatisfied preconditions with specific predicates
Incorrect add/delete effects with step indices
Frame violations and invariant breaches
Goal achievement status with missing facts

3. Two-Stage Progressive Training

Stage 1: Focus on logical consistency and state transitions
Stage 2: Optimize for end-task planning accuracy with refinement

4. Memory-Efficient Training

Designed for dual RTX 3080 setup (20GB total):

4-bit quantized base model (~4GB)
LoRA adapters (~500MB)
Gradient checkpointing
Batch accumulation strategies

📈 Evaluation Metrics

_Metric	_Description	_Target
_{Plan Validity}	_{Percentage of VAL-valid plans}	_≥90%
_{Goal Achievement}	_{Plans reaching stated goals}	_≥95%
_{Plan Efficiency}	_{Average steps vs optimal}	_<120%
_{Error Reduction}	_{Decrease in common error types}	_>75%
_{Domain Transfer}	_{Performance across domains}	_Consistent

🔧 Configuration System

Model Configuration

# configs/model/llama3_8b.yaml
model_name: meta-llama/Meta-Llama-3-8B-Instruct
load_in_4bit: true
bnb_4bit_quant_type: nf4
lora_r: 16
lora_alpha: 32
lora_dropout: 0.05
gradient_checkpointing: true
max_seq_len: 4096

Training Configuration

# configs/train_stage1.yaml
epochs: 2
learning_rate: 1e-4
loss_weights:
  state_transition: 0.6
  cot_format: 0.2
  kl_regularization: 0.2
feedback_mode: detailed
refinement_budget: 2

🧪 Testing Strategy

Test Coverage

Unit Tests: Core functionality validation
Integration Tests: End-to-end pipeline testing
Regression Tests: Performance benchmark tracking
Hardware Tests: Multi-GPU training validation

Continuous Integration

Automated linting (Black, isort, flake8)
Type checking (mypy)
Test execution (pytest)
Docker build validation
Performance benchmarking

📚 Documentation

Project Plan: Detailed development roadmap
API Reference: Code documentation
Tutorials: Step-by-step guides
Contributing: Development guidelines

🤝 Contributing

We welcome contributions! Please see our Contributing Guidelines for details on:

Code style and standards
Testing requirements
Pull request process
Issue reporting

📄 License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

🙏 Acknowledgments

PlanBench: Dataset and evaluation framework
VAL: PDDL validation toolkit
HuggingFace: Transformers and model ecosystem
Meta: Llama model family

📞 Support

Issues: Use GitHub Issues for bug reports and feature requests
Discussions: Use GitHub Discussions for questions and ideas
Documentation: Check our comprehensive docs for detailed guides

Building the future of AI-powered automated planning

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.copilot		.copilot
.github/workflows		.github/workflows
configs		configs
docker		docker
docs		docs
scripts		scripts
src/pddl_instruct_lcot		src/pddl_instruct_lcot
.editorconfig		.editorconfig
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

PDDL-INSTRUCT with Logical Chain-of-Thought (LCoT)

🎯 Project Purpose

Why This Matters

🏗️ System Architecture

🔧 Technical Stack & Rationale

Core Technologies

Training Pipeline

Memory Optimization Strategy

🚀 Quick Start

Prerequisites

Installation

Training

Evaluation

📊 Performance Targets

Expected Results

Validation Pipeline

🏗️ Project Structure

🔬 Key Innovations

1. Logical Chain-of-Thought Prompting

2. Detailed Validator Feedback

3. Two-Stage Progressive Training

4. Memory-Efficient Training

📈 Evaluation Metrics

🔧 Configuration System

Model Configuration

Training Configuration

🧪 Testing Strategy

Test Coverage

Continuous Integration

📚 Documentation

🤝 Contributing

📄 License

🙏 Acknowledgments

📞 Support

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages