GPU Experiments - Metal Compute

Progressive GPU programming exercises using Metal on Apple Silicon.

Quick Start

# See all available commands
make help

# Build and run any problem using its prefix
make go-00          # Mandelbrot
make go-01          # Parallel scan

Structure

Each problem follows the pattern:

Sequential baseline - CPU reference implementation
v1-naive - Direct GPU translation, identifies bottlenecks
v2-optimized - Address memory/compute inefficiencies
v3-advanced - Architecture-specific optimizations

The framework provides (in lib/):

MetalContext: Zero-boilerplate Metal setup, device management
Timer: GPU-aware performance measurement with bandwidth/FLOPS tracking
Visualizer: Debug arrays as heatmaps, correctness verification, access pattern analysis

Problem Progression

Mandelbrot - Embarrassingly parallel warm-up, visual debugging
Parallel Scan - Foundation for everything else
Bitonic Sort - Fixed comparison network, visualizable parallelism
Matrix Transpose - Bank conflicts, memory coalescing
Reduction - Warp divergence, atomic operations
Histogram - Atomic contention, privatization
Sparse Matrix - Irregular workloads, load balancing
Convolution - Constant memory, texture cache
Einstein Summation - Tensor contractions, index arithmetic

Implementation Notes

The framework uses direct Objective-C++ Metal API instead of metal-cpp for zero dependencies. Key abstractions:

MetalContext: Device setup, shader loading, pipeline creation
Timer: GPU-aware performance measurement with bandwidth tracking
Visualizer: Array heatmaps, correctness verification, access pattern analysis
ScopedBuffer: RAII buffer management with automatic cleanup

Key Concepts to Track

Occupancy: Active threads vs hardware maximum (1024 threads/threadgroup on M2)
Memory Bandwidth: Achieved vs theoretical (450 GB/s on M2 Max)
Bank Conflicts: Threadgroup memory contention patterns
Divergence: SIMD efficiency within simdgroups (32 threads on Apple Silicon)
Coalescing: Sequential vs strided memory access

Profiling

# Quick benchmark
make benchmark

# Detailed profiling
xcrun xctrace record --template 'Metal System Trace' --launch ./build/problems/01-parallel-scan/benchmark

Philosophy

каждая оптимизация должна быть видимой - visualize access patterns, measure everything, understand why performance changes. The goal isn't just making things fast, but understanding exactly why they're fast.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
lib		lib
problems		problems
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

GPU Experiments - Metal Compute

Quick Start

Structure

Problem Progression

Implementation Notes

Key Concepts to Track

Profiling

Philosophy

About

Uh oh!

Releases

Packages

Languages

namingbe/metals

Folders and files

Latest commit

History

Repository files navigation

GPU Experiments - Metal Compute

Quick Start

Structure

Problem Progression

Implementation Notes

Key Concepts to Track

Profiling

Philosophy

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages