The Platform for Self-Improving Code. Ideal for GPU kernels, ML model development, feature engineering, prompt engineering, and other optimizable code.
-
Updated
Mar 9, 2026 - Python
The Platform for Self-Improving Code. Ideal for GPU kernels, ML model development, feature engineering, prompt engineering, and other optimizable code.
Custom Linux kernels purpose-built for Apple Mac hardware
Forge: Swarm Agents That Turn Slow PyTorch Into Fast CUDA/Triton Kernels
Autoresearch for GPU kernels. Give it any PyTorch model, go to sleep, wake up to optimized Triton kernels.
Extended TileLang as a unified DSL to enable high-performance kernel development for Near-Memory Computing, Distributed Memory AI Accelerators, and Networked Accelerators.
A collection of high-performance CUDA kernels and experiments for learning and optimizing GPU compute primitives.
The Ultimate Kernel Orchestration Suite for Windows. Optimized for low-latency development and high-priority workloads.
CUDA HPC Kernel Optimization Textbook: Naive to Tensor Core — GEMM, FlashAttention & Quantization | CUDA 高性能算子开发教科书:从 Naive 到 Tensor Core 完整优化路径,涵盖 GEMM/FlashAttention/量化
Optimized Ubuntu Touch for Lenovo Tab M8 HD (TB-8505F) - Kernel improvements, performance tuning, boot experience, and system optimizations for the MediaTek Helio A22 tablet
Skill pack for custom PyTorch MPS kernels on Apple Silicon (examples, tests, and optimization patterns).
CUDA Kernel Library for LLM Inference: FlashAttention, HGEMM, Tensor Core GEMM with pybind11 Bindings | LLM 推理加速 CUDA Kernel 库:FlashAttention、HGEMM、Tensor Core GEMM,含 pybind11 Python 绑定
Add a description, image, and links to the kernel-optimization topic page so that developers can more easily learn about it.
To associate your repository with the kernel-optimization topic, visit your repo's landing page and select "manage topics."