Senior Member of Technical Staff @ AMD Research
I design and lead system-level software for next-generation GPU and AI systems.
Focused on programming models, runtimes, and tooling for large-scale AI/LLM workloads.
Focus areas: distributed systems × GPU programming × AI workloads × performance engineering
🔁 Iris: Triton-based multi-GPU programming framework for scalable distributed execution, enabling fusion-driven execution and communication–computation overlap for high-performance AI workloads.
🧠 IntelliKit: LLM-native GPU profiling and analysis toolkit that transforms low-level hardware counters, traces, and assembly into actionable insights.
⚡ IntelliPerf: Autonomous GPU performance engineering system for end-to-end profiling, diagnosis, and optimization using AI-driven workflows.
⚙️ IRON (MLIR-AIE): Contributor to a low-level development stack for AMD Ryzen™ AI NPUs, including Python APIs and MLIR-based compilation flows.
End-to-end stack for AI-driven GPU performance engineering:
Iris → programming model · IntelliKit → observability · IntelliPerf → optimization
Bridging low-level GPU execution with high-level AI systems.
Ph.D., UC Davis (advised by John Owens).
GPU-native concurrent data structures: B-trees, dynamic graphs, multiversioning, memory reclamation, and high-performance hashing.




