kernel-optimization

Here are 11 public repositories matching this topic...

WecoAI / weco-cli

The Platform for Self-Improving Code. Ideal for GPU kernels, ML model development, feature engineering, prompt engineering, and other optimizable code.

machine-learning code-generation code-optimization prompt-engineering kernel-optimization

Updated Mar 9, 2026
Python

wolffcatskyy / linux-mac

Star

Custom Linux kernels purpose-built for Apple Mac hardware

performance linux-kernel arch-linux kvm qemu pvg amdgpu macpro kernel-optimization mac-hardware macos-virtualization

Updated Feb 24, 2026
Shell

RightNow-AI / forge-mcp-server

Star

Forge: Swarm Agents That Turn Slow PyTorch Into Fast CUDA/Triton Kernels

cuda pytorch triton swarm-agents kernel-optimization

Updated Jan 30, 2026
TypeScript

RightNow-AI / autokernel

Star

Autoresearch for GPU kernels. Give it any PyTorch model, go to sleep, wake up to optimized Triton kernels.

gpu cuda pytorch triton kernel-optimization autoresearch

Updated Mar 11, 2026
Python

SUNMMIO / Tilelang

Star

Extended TileLang as a unified DSL to enable high-performance kernel development for Near-Memory Computing, Distributed Memory AI Accelerators, and Networked Accelerators.

scale-up scale-out near-memory-compute kernel-optimization distributed-memory-accelator

Updated Mar 6, 2026
Python

tianyuxbear / cuda-kernels

Star

A collection of high-performance CUDA kernels and experiments for learning and optimizing GPU compute primitives.

gpu cuda high-performance-computing cuda-kernels kernel-optimization

Updated Jan 21, 2026
Cuda

theyonecodes / Windows-Super-Smooth

Star

The Ultimate Kernel Orchestration Suite for Windows. Optimized for low-latency development and high-priority workloads.

performance windows-10 low-latency dev-tools windows-11 kernel-optimization state-ideation

Updated Mar 10, 2026
Batchfile

LessUp / hpc-ai-optimization-lab

Star

CUDA HPC Kernel Optimization Textbook: Naive to Tensor Core — GEMM, FlashAttention & Quantization | CUDA 高性能算子开发教科书：从 Naive 到 Tensor Core 完整优化路径，涵盖 GEMM/FlashAttention/量化

deep-learning cpp hpc optimization cuda gpu-computing kernel-optimization

Updated Mar 10, 2026
Cuda

thephfox / ubuntu-touch-lenovo-tab-m8

Star

Optimized Ubuntu Touch for Lenovo Tab M8 HD (TB-8505F) - Kernel improvements, performance tuning, boot experience, and system optimizations for the MediaTek Helio A22 tablet

linux mediatek performance-tuning arm64 ubuntu-touch ubports kernel-optimization lenovo-tab-m8 helio-a22

Updated Feb 8, 2026
Shell

ssmall256 / mps-kernels-skill

Star

Skill pack for custom PyTorch MPS kernels on Apple Silicon (examples, tests, and optimization patterns).

python machine-learning deep-learning metal gpu pytorch mps apple-silicon kernel-optimization metal-shading-language pytorch-mps

Updated Feb 16, 2026
Python

LessUp / llm-speed

Star

CUDA Kernel Library for LLM Inference: FlashAttention, HGEMM, Tensor Core GEMM with pybind11 Bindings | LLM 推理加速 CUDA Kernel 库：FlashAttention、HGEMM、Tensor Core GEMM，含 pybind11 Python 绑定

deep-learning cuda attention gpu-computing gemm pybind11 llm kernel-optimization

Updated Mar 10, 2026
Python

Improve this page

Add a description, image, and links to the kernel-optimization topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the kernel-optimization topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kernel-optimization

Here are 11 public repositories matching this topic...

WecoAI / weco-cli

wolffcatskyy / linux-mac

RightNow-AI / forge-mcp-server

RightNow-AI / autokernel

SUNMMIO / Tilelang

tianyuxbear / cuda-kernels

theyonecodes / Windows-Super-Smooth

LessUp / hpc-ai-optimization-lab

thephfox / ubuntu-touch-lenovo-tab-m8

ssmall256 / mps-kernels-skill

LessUp / llm-speed

Improve this page

Add this topic to your repo