An AI-powered framework for automatically generating, analyzing, and optimizing GPU kernels from PyTorch code.
Project Address: https://github.com/sinarafati-amd/HIPKernelBench/tree/dev_feature_branch_experimental
HIPKernelBench is a comprehensive system designed to automatically translate PyTorch operations into optimized GPU kernels using an agentic AI approach. The system leverages large language models (LLMs) to analyze PyTorch code, generate equivalent GPU implementations, and optimize them for performance using genetic or Bayesian search methods.
The framework follows a two-phase approach:
- Phase 1 - LLM Synthesis: Analyze PyTorch code and generate functionally correct HIP kernels
- Phase 2 - Optimization: Apply Bayesian or genetic optimization techniques to fine-tune kernel parameters
.
├── kernel-agentic/ # Main codebase
│ ├── config.yml # Configuration settings for the system
│ ├── requirements.txt # Python dependencies
│ ├── Makefile # Build and run automation
│ ├── src/ # Source code
│ │ ├── agents/ # AI agent components
│ │ │ ├── torch_analyser.py # Analyzes PyTorch code
│ │ │ ├── kernel_generator.py # Generates HIP kernels
│ │ │ ├── orchestrator.py # Main workflow coordinator
│ │ │ ├── rag_researcher.py # Retrieval-augmented generation agent
│ │ │ ├── executor.py # Executes and tests kernels
│ │ │ ├── search_agent.py # Handles optimization search
│ │ │ └── feedback_analyzer.py # Analyzes execution feedback
│ │ ├── data/ # Data handling utilities
│ │ ├── eval/ # Evaluation tools
│ │ ├── optim/ # Optimization algorithms
│ │ │ ├── bayes.py # Bayesian optimization
│ │ │ └── genetic.py # Genetic algorithm optimization
│ │ ├── training/ # Training infrastructure
│ │ │ ├── lora_finetune.py # LoRA fine-tuning
│ │ │ └── grpo_finetune.py # GRPO reinforcement learning
│ │ └── utils/ # Utility functions
│ ├── scripts/ # Helper scripts
│ ├── torch_codes/ # PyTorch code examples
│ ├── logs/ # Generation logs and results
│ ├── docs/ # Documentation
│ └── vector_store/ # Vector embeddings for RAG
- Orchestrator: Coordinates the overall workflow
- TorchAnalyser: Analyzes PyTorch code to understand its functionality
- RAGResearcher: Retrieves relevant documentation and examples
- KernelGenerator: Generates HIP kernel code
- Executor: Compiles, runs, and profiles kernel performance
- SearchAgent: Coordinates optimization search algorithms
- BayesOpt: Bayesian optimization for kernel parameters
- GeneticOpt: Genetic algorithm for parameter optimization
- LoRA Finetuning: Fine-tuning LLMs with Low-Rank Adaptation
- GRPO Finetuning: Gradient-based Reinforcement Learning from Policy Optimization
-
Clone the repository:
git clone <repository-url> cd HIPKernelBench/kernel-agentic
-
Install uv.
-
Install dependencies:
make dev-setup
Create the vector store for RAG from documentation you can pass one of the languages [hip,cuda, triton] but default is hip:
make vector-storeTo specify a different language (default is HIP):
make vector-store lang=cudaRun the system on all PyTorch files in the torch_codes directory:
make runThis will:
- Process each PyTorch file
- Generate equivalent kernels
- Save generation history to logs/
Combine generation histories for training:
make combine-historyTrain a LoRA adapter on the generation data:
make train-loraTrain with GRPO (Gradient-based Reinforcement Learning):
make grpo-finetuneGenerate baseline timing information:
make baseline-time TORCH_FILE=path/to/torch_file.pyRun and check correctness against PyTorch:
make run-check HIP_BIN=path/to/binary TORCH_FILE=path/to/torch_file.pyThe system is configured through the config.yml file with the following key sections:
- Pipeline: Configuration for the overall workflow
- openai: API settings for LLM access
- gpu_specs: Target GPU specifications ---> not required as gpu_spec is also being extracted from rocm api
- eval: Evaluation parameters
- training: LoRA and GRPO fine-tuning parameters