Skip to content

hkevin01/pddl-instruct-lcot

Repository files navigation

PDDL-INSTRUCT with Logical Chain-of-Thought (LCoT)

CI License: Apache-2.0 Python 3.11 Docker

🎯 Project Purpose

This project implements PDDL-INSTRUCT with Logical Chain-of-Thought (LCoT), a novel approach to improve Large Language Model (LLM) performance on automated planning tasks. The system enhances planning capabilities through:

  • Structured Reasoning: Logical chain-of-thought prompting with stateβ†’actionβ†’state traces
  • External Validation: Integration with VAL (PDDL validator) for plan verification and detailed feedback
  • Two-Stage Training: Progressive optimization from reasoning consistency to plan accuracy
  • Error-Aware Learning: Sophisticated error diagnosis and refinement mechanisms

Why This Matters

Traditional LLMs struggle with complex planning tasks, achieving <50% validity on domains like Blocksworld. This project addresses fundamental limitations:

  1. Logical Consistency: Ensures state transitions follow PDDL semantics
  2. Error Understanding: Maps validator feedback to natural language explanations
  3. Systematic Improvement: Structured training pipeline for consistent gains
  4. Domain Transfer: Robust performance across different planning domains

πŸ—οΈ System Architecture

%%{init: {'theme':'dark', 'themeVariables': { 'primaryColor': '#1f2937', 'primaryTextColor': '#f9fafb', 'primaryBorderColor': '#374151', 'lineColor': '#6b7280', 'secondaryColor': '#374151', 'tertiaryColor': '#111827', 'background': '#111827', 'mainBkg': '#1f2937', 'secondBkg': '#374151', 'tertiaryBkg': '#111827'}}}%%
flowchart TD
    A[PDDL Domain & Problem] --> B[Prompt Templates]
    B --> C[LLM with LoRA]
    C --> D[Logical CoT Generation]
    D --> E{Plan Generated}
    E --> F[VAL Validator]
    F --> G{Valid Plan?}
    G -->|Yes| H[Success]
    G -->|No| I[Feedback Parser]
    I --> J[Error Analysis]
    J --> K[Refinement Prompt]
    K --> C
    
    style A fill:#1f2937,stroke:#6b7280,color:#f9fafb
    style B fill:#1f2937,stroke:#6b7280,color:#f9fafb
    style C fill:#1f2937,stroke:#6b7280,color:#f9fafb
    style D fill:#1f2937,stroke:#6b7280,color:#f9fafb
    style E fill:#1f2937,stroke:#6b7280,color:#f9fafb
    style F fill:#1f2937,stroke:#6b7280,color:#f9fafb
    style G fill:#1f2937,stroke:#6b7280,color:#f9fafb
    style H fill:#059669,stroke:#6b7280,color:#f9fafb
    style I fill:#1f2937,stroke:#6b7280,color:#f9fafb
    style J fill:#1f2937,stroke:#6b7280,color:#f9fafb
    style K fill:#1f2937,stroke:#6b7280,color:#f9fafb
Loading

πŸ”§ Technical Stack & Rationale

Core Technologies

Technology Purpose Why Chosen
Python 3.11 Primary Language Latest stable features, type hints, performance improvements
PyTorch Deep Learning Framework Mature ecosystem, CUDA support, research-friendly
Transformers/vLLM LLM Infrastructure HuggingFace integration, optimized inference
PEFT/LoRA Parameter-Efficient Training Memory-efficient fine-tuning for limited hardware
VAL PDDL Plan Validator Industry-standard validator with detailed feedback
Hydra Configuration Management Hierarchical configs, experiment reproducibility

Training Pipeline

%%{init: {'theme':'dark', 'themeVariables': { 'primaryColor': '#1f2937', 'primaryTextColor': '#f9fafb', 'primaryBorderColor': '#374151', 'lineColor': '#6b7280', 'secondaryColor': '#374151', 'tertiaryColor': '#111827', 'background': '#111827', 'mainBkg': '#1f2937', 'secondBkg': '#374151', 'tertiaryBkg': '#111827'}}}%%
graph LR
    A[Stage 1: Reasoning] --> B[State Transition Loss]
    A --> C[CoT Format Loss]
    A --> D[KL Regularization]
    
    E[Stage 2: Planning] --> F[Plan Validity Loss]
    E --> G[Teacher Forcing]
    E --> H[Refinement Loop]
    
    B --> I[Combined Loss]
    C --> I
    D --> I
    F --> J[Planning Loss]
    G --> J
    H --> J
    
    I --> K[Stage 1 Model]
    J --> L[Stage 2 Model]
    K --> E
    
    style A fill:#1f2937,stroke:#6b7280,color:#f9fafb
    style B fill:#1f2937,stroke:#6b7280,color:#f9fafb
    style C fill:#1f2937,stroke:#6b7280,color:#f9fafb
    style D fill:#1f2937,stroke:#6b7280,color:#f9fafb
    style E fill:#1f2937,stroke:#6b7280,color:#f9fafb
    style F fill:#1f2937,stroke:#6b7280,color:#f9fafb
    style G fill:#1f2937,stroke:#6b7280,color:#f9fafb
    style H fill:#1f2937,stroke:#6b7280,color:#f9fafb
    style I fill:#374151,stroke:#6b7280,color:#f9fafb
    style J fill:#374151,stroke:#6b7280,color:#f9fafb
    style K fill:#059669,stroke:#6b7280,color:#f9fafb
    style L fill:#059669,stroke:#6b7280,color:#f9fafb
Loading

Memory Optimization Strategy

Component Technique Memory Savings
Base Model 4-bit Quantization (NF4) ~75% reduction
Training LoRA (r=16, Ξ±=32) ~90% trainable params
Gradients Gradient Checkpointing ~50% activation memory
Attention Flash Attention ~40% attention memory
Batching Gradient Accumulation Larger effective batch sizes

πŸš€ Quick Start

Prerequisites

  • Python 3.11+
  • CUDA-capable GPU (recommended: 2x RTX 3080 or equivalent)
  • Docker (optional but recommended)

Installation

  1. Clone the repository
git clone https://github.com/hkevin01/pddl-instruct-lcot.git
cd pddl-instruct-lcot
  1. Install dependencies
# Using pip
pip install -r requirements.txt
pip install -e .[dev,test]

# Or using Docker
docker-compose up --build
  1. Setup VAL validator
bash scripts/setup_val.sh
  1. Download datasets
bash scripts/download_planbench.sh

Training

# Stage 1: Reasoning optimization
python scripts/train_stage1.py --config-path configs --config-name train_stage1

# Stage 2: Planning optimization  
python scripts/train_stage2.py --config-path configs --config-name train_stage2

Evaluation

# No-feedback evaluation (final test)
python scripts/evaluate.py --config-path configs --config-name eval/no_feedback

# Interactive inference
python scripts/run_inference.py \
  --domain data/raw/blocksworld/domain.pddl \
  --problem data/raw/blocksworld/p01.pddl \
  --no-feedback

πŸ“Š Performance Targets

Expected Results

Domain Baseline Target Improvement
Blocksworld <50% β‰₯90% +40%+ absolute
Logistics ~30% β‰₯85% +55%+ absolute
Mystery Blocksworld ~1% β‰₯60% 60Γ— relative

Validation Pipeline

%%{init: {'theme':'dark', 'themeVariables': { 'primaryColor': '#1f2937', 'primaryTextColor': '#f9fafb', 'primaryBorderColor': '#374151', 'lineColor': '#6b7280', 'secondaryColor': '#374151', 'tertiaryColor': '#111827', 'background': '#111827', 'mainBkg': '#1f2937', 'secondBkg': '#374151', 'tertiaryBkg': '#111827'}}}%%
sequenceDiagram
    participant M as Model
    participant V as VAL Validator
    participant F as Feedback Parser
    participant R as Refinement

    M->>V: Generated Plan
    V->>F: Validation Output
    F->>F: Parse Errors
    alt Plan Valid
        F->>M: Success Signal
    else Plan Invalid
        F->>R: Structured Feedback
        R->>M: Refinement Prompt
        M->>V: Revised Plan
    end
    
    rect rgb(31, 41, 55)
        note over M,R: Training Mode Only
    end
Loading

πŸ—οΈ Project Structure

pddl-instruct-lcot/
β”œβ”€β”€ πŸ“ src/pddl_instruct_lcot/     # Core library code
β”‚   β”œβ”€β”€ πŸ“ data/                   # Data loading and processing
β”‚   β”œβ”€β”€ πŸ“ modeling/               # Model and tokenizer utilities
β”‚   β”œβ”€β”€ πŸ“ planning/               # PDDL parsing and validation
β”‚   β”œβ”€β”€ πŸ“ training/               # Training pipelines
β”‚   β”œβ”€β”€ πŸ“ eval/                   # Evaluation metrics
β”‚   └── πŸ“ prompts/                # Prompt templates
β”œβ”€β”€ πŸ“ scripts/                    # Executable scripts
β”œβ”€β”€ πŸ“ configs/                    # Hydra configuration files
β”œβ”€β”€ πŸ“ data/                       # Dataset storage
β”‚   β”œβ”€β”€ πŸ“ raw/                    # Original PDDL files
β”‚   └── πŸ“ processed/              # Instruction-tuning datasets
β”œβ”€β”€ πŸ“ tests/                      # Test suites
β”œβ”€β”€ πŸ“ docs/                       # Documentation
└── πŸ“ docker/                     # Containerization

πŸ”¬ Key Innovations

1. Logical Chain-of-Thought Prompting

Traditional approaches rely on end-to-end generation. Our LCoT format ensures:

Step i:
  Preconditions: [check each required condition]
  Apply Effects: add {...}, delete {...}
  Next State: [canonical fact list]

2. Detailed Validator Feedback

Instead of binary valid/invalid, we extract:

  • Unsatisfied preconditions with specific predicates
  • Incorrect add/delete effects with step indices
  • Frame violations and invariant breaches
  • Goal achievement status with missing facts

3. Two-Stage Progressive Training

  • Stage 1: Focus on logical consistency and state transitions
  • Stage 2: Optimize for end-task planning accuracy with refinement

4. Memory-Efficient Training

Designed for dual RTX 3080 setup (20GB total):

  • 4-bit quantized base model (~4GB)
  • LoRA adapters (~500MB)
  • Gradient checkpointing
  • Batch accumulation strategies

πŸ“ˆ Evaluation Metrics

Metric Description Target
Plan Validity Percentage of VAL-valid plans β‰₯90%
Goal Achievement Plans reaching stated goals β‰₯95%
Plan Efficiency Average steps vs optimal <120%
Error Reduction Decrease in common error types >75%
Domain Transfer Performance across domains Consistent

πŸ”§ Configuration System

Model Configuration

# configs/model/llama3_8b.yaml
model_name: meta-llama/Meta-Llama-3-8B-Instruct
load_in_4bit: true
bnb_4bit_quant_type: nf4
lora_r: 16
lora_alpha: 32
lora_dropout: 0.05
gradient_checkpointing: true
max_seq_len: 4096

Training Configuration

# configs/train_stage1.yaml
epochs: 2
learning_rate: 1e-4
loss_weights:
  state_transition: 0.6
  cot_format: 0.2
  kl_regularization: 0.2
feedback_mode: detailed
refinement_budget: 2

πŸ§ͺ Testing Strategy

Test Coverage

  • Unit Tests: Core functionality validation
  • Integration Tests: End-to-end pipeline testing
  • Regression Tests: Performance benchmark tracking
  • Hardware Tests: Multi-GPU training validation

Continuous Integration

  • Automated linting (Black, isort, flake8)
  • Type checking (mypy)
  • Test execution (pytest)
  • Docker build validation
  • Performance benchmarking

πŸ“š Documentation

🀝 Contributing

We welcome contributions! Please see our Contributing Guidelines for details on:

  • Code style and standards
  • Testing requirements
  • Pull request process
  • Issue reporting

πŸ“„ License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

πŸ™ Acknowledgments

  • PlanBench: Dataset and evaluation framework
  • VAL: PDDL validation toolkit
  • HuggingFace: Transformers and model ecosystem
  • Meta: Llama model family

πŸ“ž Support

  • Issues: Use GitHub Issues for bug reports and feature requests
  • Discussions: Use GitHub Discussions for questions and ideas
  • Documentation: Check our comprehensive docs for detailed guides

Building the future of AI-powered automated planning

About

This project implements PDDL-INSTRUCT with Logical Chain-of-Thought (LCoT), a novel approach to improve Large Language Model (LLM) performance on automated planning tasks. The system enhances planning capabilities through:

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors