HIPKernelBench

An AI-powered framework for automatically generating, analyzing, and optimizing GPU kernels from PyTorch code.

Project Address： https://github.com/sinarafati-amd/HIPKernelBench/tree/dev_feature_branch_experimental

Project Overview

HIPKernelBench is a comprehensive system designed to automatically translate PyTorch operations into optimized GPU kernels using an agentic AI approach. The system leverages large language models (LLMs) to analyze PyTorch code, generate equivalent GPU implementations, and optimize them for performance using genetic or Bayesian search methods.

The framework follows a two-phase approach:

Phase 1 - LLM Synthesis: Analyze PyTorch code and generate functionally correct HIP kernels
Phase 2 - Optimization: Apply Bayesian or genetic optimization techniques to fine-tune kernel parameters

Repository Structure

.
├── kernel-agentic/              # Main codebase
│   ├── config.yml              # Configuration settings for the system
│   ├── requirements.txt        # Python dependencies
│   ├── Makefile               # Build and run automation
│   ├── src/                    # Source code
│   │   ├── agents/             # AI agent components
│   │   │   ├── torch_analyser.py    # Analyzes PyTorch code
│   │   │   ├── kernel_generator.py  # Generates HIP kernels
│   │   │   ├── orchestrator.py      # Main workflow coordinator
│   │   │   ├── rag_researcher.py    # Retrieval-augmented generation agent
│   │   │   ├── executor.py          # Executes and tests kernels
│   │   │   ├── search_agent.py      # Handles optimization search
│   │   │   └── feedback_analyzer.py # Analyzes execution feedback
│   │   ├── data/               # Data handling utilities
│   │   ├── eval/               # Evaluation tools
│   │   ├── optim/              # Optimization algorithms
│   │   │   ├── bayes.py        # Bayesian optimization
│   │   │   └── genetic.py      # Genetic algorithm optimization
│   │   ├── training/           # Training infrastructure
│   │   │   ├── lora_finetune.py    # LoRA fine-tuning
│   │   │   └── grpo_finetune.py    # GRPO reinforcement learning
│   │   └── utils/              # Utility functions
│   ├── scripts/                # Helper scripts
│   ├── torch_codes/            # PyTorch code examples
│   ├── logs/                   # Generation logs and results
│   ├── docs/                   # Documentation
│   └── vector_store/           # Vector embeddings for RAG

Key Components

Agents

Orchestrator: Coordinates the overall workflow
TorchAnalyser: Analyzes PyTorch code to understand its functionality
RAGResearcher: Retrieves relevant documentation and examples
KernelGenerator: Generates HIP kernel code
Executor: Compiles, runs, and profiles kernel performance
SearchAgent: Coordinates optimization search algorithms

Optimization

BayesOpt: Bayesian optimization for kernel parameters
GeneticOpt: Genetic algorithm for parameter optimization

Training

LoRA Finetuning: Fine-tuning LLMs with Low-Rank Adaptation
GRPO Finetuning: Gradient-based Reinforcement Learning from Policy Optimization

Installation and Setup

Clone the repository:

git clone <repository-url>
cd HIPKernelBench/kernel-agentic

Install uv.
Install dependencies:
```
make dev-setup
```

Usage Instructions

Building Vector Store

Create the vector store for RAG from documentation you can pass one of the languages [hip,cuda, triton] but default is hip:

make vector-store

To specify a different language (default is HIP):

make vector-store lang=cuda

Running the System

Run the system on all PyTorch files in the torch_codes directory:

make run

This will:

Process each PyTorch file
Generate equivalent kernels
Save generation history to logs/

Fine-tuning LLMs

Combine generation histories for training:

make combine-history

Train a LoRA adapter on the generation data:

make train-lora

Train with GRPO (Gradient-based Reinforcement Learning):

make grpo-finetune

Performance Analysis

Generate baseline timing information:

make baseline-time TORCH_FILE=path/to/torch_file.py

Run and check correctness against PyTorch:

make run-check HIP_BIN=path/to/binary TORCH_FILE=path/to/torch_file.py

Configuration

The system is configured through the config.yml file with the following key sections:

Pipeline: Configuration for the overall workflow
openai: API settings for LLM access
gpu_specs: Target GPU specifications ---> not required as gpu_spec is also being extracted from rocm api
eval: Evaluation parameters
training: LoRA and GRPO fine-tuning parameters

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
SFT_Data		SFT_Data
kernel-agentic		kernel-agentic
reports		reports
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
evalboard.md		evalboard.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HIPKernelBench

Project Overview

Repository Structure

Key Components

Agents

Optimization

Training

Installation and Setup

Usage Instructions

Building Vector Store

Running the System

Fine-tuning LLMs

Performance Analysis

Configuration

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

HIPKernelBench

Project Overview

Repository Structure

Key Components

Agents

Optimization

Training

Installation and Setup

Usage Instructions

Building Vector Store

Running the System

Fine-tuning LLMs

Performance Analysis

Configuration

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages