Skip to content
View mawad-amd's full-sized avatar
:shipit:
:shipit:

Organizations

@AMDResearch

Block or report mawad-amd

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
mawad-amd/README.md

Muhammad Awad

Senior Member of Technical Staff @ AMD Research

I design and lead system-level software for next-generation GPU and AI systems.
Focused on programming models, runtimes, and tooling for large-scale AI/LLM workloads.

Focus areas: distributed systems × GPU programming × AI workloads × performance engineering

Key Projects

🔁 Iris: Triton-based multi-GPU programming framework for scalable distributed execution, enabling fusion-driven execution and communication–computation overlap for high-performance AI workloads.

🧠 IntelliKit: LLM-native GPU profiling and analysis toolkit that transforms low-level hardware counters, traces, and assembly into actionable insights.

IntelliPerf: Autonomous GPU performance engineering system for end-to-end profiling, diagnosis, and optimization using AI-driven workflows.

⚙️ IRON (MLIR-AIE): Contributor to a low-level development stack for AMD Ryzen™ AI NPUs, including Python APIs and MLIR-based compilation flows.

System Perspective

End-to-end stack for AI-driven GPU performance engineering:
Iris → programming model · IntelliKit → observability · IntelliPerf → optimization

Bridging low-level GPU execution with high-level AI systems.

Research Background

Ph.D., UC Davis (advised by John Owens).
GPU-native concurrent data structures: B-trees, dynamic graphs, multiversioning, memory reclamation, and high-performance hashing.

Links

🌐 https://maawad.github.io · 📄 https://github.com/maawad

Pinned Loading

  1. ROCm/iris ROCm/iris Public

    AMD RAD's multi-GPU Triton-based framework for seamless multi-GPU programming

    Python 183 37

  2. AMDResearch/intelliperf AMDResearch/intelliperf Public

    Automated bottleneck detection and solution orchestration

    Python 19 4

  3. Xilinx/mlir-aie Xilinx/mlir-aie Public

    An MLIR-based toolchain for AMD AI Engine-enabled devices.

    C 600 174

  4. AMDResearch/intellikit AMDResearch/intellikit Public

    IntelliKit is a collection of intelligent tools designed to make GPU kernel development, profiling, and validation accessible to LLMs and human developers alike. Built for AMD ROCm, these tools pro…

    Python 7