[amdgpu] LLVM 20 updates for AMD MI3xx GPUs by tmm77 · Pull Request #8793 · taichi-dev/taichi

tmm77 · 2026-04-15T18:09:10Z

Issue: #

Brief Summary

These code changes update LLVM to version 20 for AMD GPU code generation to enable Taichi on MI300X, MI325X, and MI355X.

Note

High Risk
High risk because it changes LLVM integration across AMDGPU/CUDA/CPU/DX12 backends (pass pipelines, pointer types, intrinsics), which can affect code generation correctness and runtime stability across platforms.

Overview
Updates build and CI tooling to prefer Clang/LLVM 20 (including Linux compiler discovery) and adjusts the build scripts to use system-provided LLVM/CUDA paths rather than always downloading prebuilts.

Modernizes multiple backends for LLVM 16–20 compatibility: switches CPU/CUDA/AMDGPU/DX12 codegen and JIT paths to the New Pass Manager, adapts to removed/renamed LLVM headers/APIs, replaces CUDA nvvm_ldg intrinsics with an address-space load + !invariant.load metadata, and updates various pointer casts toward opaque pointers.

Adds new math ops erf/erfc end-to-end (IR builder, expression ops, LLVM/CUDA codegen, Python API exports), introduces a ROCm multi-stage Dockerfile.rocm plus ReadTheDocs/Sphinx docs for ROCm-Simulation packaging, and tweaks microbenchmarks to support amdgpu and CLI-selected plans.

^{Reviewed by Cursor Bugbot for commit 440fcc2. Bugbot is set up for automated code reviews on this repo. Configure here.}

…plans to run

Parameterize microbenchmarks and vulkan sdk update

fix: Patch to avoid the need to fetch source to build Taichi wheel

Taichi Dockerfile

Co-authored-by: Bhavesh Lad <Bhavesh.Lad@amd.com> Co-authored-by: Tiffany Mintz <tiffany.mintz@amd.com>

Merge latest upstream

Merge master updates

Merge latest Updates

…nt changes

…TX handling, and implement new pass manager setup

…14023 from johnnynunez/taichi master branch

from johnnynunez/taichi master branch; some of the changes from these were captured in the previous commit to rocm/taichi

…rom johnnynunez/taichi master branch

Mintz/llvm20 update

Syncing latest release branch with amd-integration branch

cursor · 2026-04-15T18:13:27Z

+  // but to insert passes in the middle, we construct it manually. A simpler way is to
+  // use `parsePassPipeline`. For now, we build the default pipeline first.
+  if (config.opt_level > 0) {
+      MPM = PB.buildPerModuleDefaultPipeline(opt_level);


DX12 intrinsic lowering pass lost on reassignment

High Severity

When config.opt_level > 0, MPM is reassigned via MPM = PB.buildPerModuleDefaultPipeline(opt_level), which completely discards the previously added createTaichiIntrinsicLowerPass. The original code added this pass first, then populated optimization passes on the same manager. Now the intrinsic lowering pass never runs for DX12 when optimizations are enabled.

^{Reviewed by Cursor Bugbot for commit f47d1b8. Configure here.}

cursor · 2026-04-15T18:13:27Z

+
+    machine_gen_gcn->registerPassBuilderCallbacks(module_gen_gcn_pass_manager);
+
+    builder.run(*module_clone, MAM);


AMDGPU GCN output empty for LLVM 17+

Medium Severity

In the print_kernel_amdgcn path for LLVM_VERSION_MAJOR >= 17, the code sets up a new pass manager and runs optimization passes on the cloned module, but never calls addPassesToEmitFile to write assembly to llvm_stream_gcn. The gcnstr buffer remains empty, so the written GCN file will contain no content. The legacy path correctly emits assembly via addPassesToEmitFile.

^{Reviewed by Cursor Bugbot for commit f47d1b8. Configure here.}

cursor · 2026-04-15T18:13:27Z

+    if ((u.system, u.machine) not in (("Linux", "arm64"), ("Linux", "aarch64"))) and not (cmake_args.get_effective("TI_WITH_AMDGPU")):
+        os.environ["LLVM_DIR"] = "/usr/lib/llvm-20/cmake"
+        os.environ["CUDA_HOME"] = "/usr/local/cuda"
+        os.environ["CPATH"] = "/usr/local/cuda/include"


LLVM_DIR hardcoded to Linux path for all platforms

Medium Severity

The final LLVM_DIR assignment unconditionally sets it to /usr/lib/llvm-20/cmake for all non-ARM-Linux, non-AMDGPU platforms, including macOS and Windows. The original code used str(out) which pointed to the platform-specific downloaded LLVM path. This overwrites the correct out-based paths for Darwin and Windows, breaking LLVM discovery on those platforms. Similarly, CUDA_HOME and CPATH are set to Linux-specific paths.

^{Reviewed by Cursor Bugbot for commit f47d1b8. Configure here.}

cursor · 2026-04-15T18:13:27Z

+                      f.read())
+    if not match:
+        raise ValueError("VERSION not found!")
+    version_number = match[1]


Docs conf.py searches for nonexistent CMake function

Medium Severity

The docs/conf.py searches for rocm_setup_version(VERSION ...) in CMakeLists.txt, but the project's CMakeLists.txt does not contain this function call. This causes a ValueError("VERSION not found!") to be raised every time the documentation is built, completely breaking the docs build pipeline.

^{Reviewed by Cursor Bugbot for commit f47d1b8. Configure here.}

This is to address AMD security concerns

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 5 total unresolved issues (including 4 from previous reviews).

^{Reviewed by Cursor Bugbot for commit 440fcc2. Configure here.}

cursor · 2026-05-04T15:33:01Z

+    llvm::ModulePassManager builder =
+      module_pass_manager.buildPerModuleDefaultPipeline(llvm::OptimizationLevel::O3);
+
+    machine->registerPassBuilderCallbacks(module_pass_manager);


AMDGPU target callbacks registered after pipeline is built

High Severity

machine->registerPassBuilderCallbacks() is called after buildPerModuleDefaultPipeline(), so AMDGPU target-specific passes are never included in the optimization pipeline. Both the CPU (codegen_cpu.cpp:311) and CUDA (jit_cuda.cpp:201) implementations correctly call registerPassBuilderCallbacks before building the pipeline. This same ordering mistake occurs twice — in the GCN printing path and the main optimization path.

Additional Locations (1)

taichi/runtime/amdgpu/jit_amdgpu.cpp#L126-L127

^{Reviewed by Cursor Bugbot for commit 440fcc2. Configure here.}

tmm77 and others added 30 commits April 29, 2025 17:14

modifications to microbenchmark suite to run on AMD GPUs

e4a5be3

adding arguments for selecting a list of architectures and benchmark …

e932646

…plans to run

additional modifications for single arch and benchmark plan runs

8a9ca3b

temporarily setting atomic ops repeat to 1

6e4fb08

updating vulkan sdk downlaod url

efe237e

removing comments for saving json files

c3d7b84

Merge pull request #1 from AMD-AI/mintz/parameterize_microbenchmark

eeb3354

Parameterize microbenchmarks and vulkan sdk update

Patch to avoid the need to fetch to build Taichi wheel

bb8a9b3

fix: Patch to avoid the need to fetch source to build Taichi wheel

c137b06

fix: Patch to avoid the need to fetch source to build Taichi wheel

Taicho Multistage Dockerfile

b74c00c

Taichi Multistage Dockerfile

6b0f58b

Taichi Dockerfile

setting architecture to gpu

f791165

ROCm port of taichi

1a6520a

Co-authored-by: Bhavesh Lad <Bhavesh.Lad@amd.com> Co-authored-by: Tiffany Mintz <tiffany.mintz@amd.com>

Merge pull request #3 from taichi-dev/master

86b6184

Merge latest upstream

Merge branch 'amd-develop' into master

9260d4e

Merge pull request #4 from ROCm/master

712d405

Merge master updates

Merge branch 'amd-integration' into amd-develop

46444ee

Merge pull request #5 from ROCm/amd-develop

5eed1b4

Merge latest Updates

LLVM-20

0f2615c

Update LLVM API calls in codegen_cuda.cpp for compatibility with rece…

c189397

…nt changes

Add CHANGELOG.md to document recent updates and improvements

23478fd

Fix include directive for IR analysis header in codegen_cuda.cpp

c5edfdb

Refactor JIT compilation in CUDA: update function pointers, enhance P…

2d4703f

…TX handling, and implement new pass manager setup

Update header includes and fix LLVM API calls in CPU code generation

d2c87f6

Fix header include for program in codegen_cpu.cpp

ad65ec9

cmake build updates, header fixes; Merging from commits ebdc72b to 9d…

1be07f3

…14023 from johnnynunez/taichi master branch

implementing error function and cuda updates; merging 5449f72 to 649c58d

de14f98

from johnnynunez/taichi master branch; some of the changes from these were captured in the previous commit to rocm/taichi

removing updates for blackwell

c984b3c

removing blackwell updates; restoring window_base.cpp include

f5118a7

additional cuda updates for llvm20; merging from 8ca16de to add2df3 f…

78d9213

…rom johnnynunez/taichi master branch

tmm77 and others added 22 commits August 28, 2025 16:04

additional updates for llvm 20

d20c823

fix build issues with llvm 20 update

f0ca790

updated AMD Instinct GPU jit implementation to llvm 20

26ae12c

updating amd gpu kernel code generation to llvm 20

514446e

fix object file type; setting llvm dir based on environment var

2a6adb0

adding bitcode for gfx940,gfx941,gfx942,gfx950

5516360

adding patch for changes to external spdlog

48cc4f7

Merge pull request #6 from ROCm/mintz/llvm20_update

76c25df

Mintz/llvm20 update

updating dockerfile for llvm 20

ed925e6

Update Dockerfile to fix pipeline issues

a78aaca

dockerfile copy dir

2549e39

Dockerfile reformat

300196b

CI: Fix Dockerfile issues

8dab171

Fix Tester Issues

ed1c61d

removing any existing build cache

29c4129

Fix Version Issues

7b155bb

Merge branch 'amd-integration' into release/1.8.0b2

104dc18

Docs: Taichi component, configs and setup for 25.11 release (#2)

13a0550

Merge pull request #8 from ROCm/release/1.8.0b2

39cc7fe

Syncing latest release branch with amd-integration branch

removing rocm_setup_version

36c0aa5

Update taichi-install.rst

7c446fb

removing pull_request.yml for security concerns

f47d1b8

cursor Bot reviewed Apr 15, 2026

View reviewed changes

tmm77 changed the title ~~LLVM 20 updates for AMD MI3xx GPUs~~ [amdgpu] LLVM 20 updates for AMD MI3xx GPUs Apr 16, 2026

Delete ci/assets/mitm-ca.crt

440fcc2

This is to address AMD security concerns

cursor Bot reviewed May 4, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[amdgpu] LLVM 20 updates for AMD MI3xx GPUs#8793

[amdgpu] LLVM 20 updates for AMD MI3xx GPUs#8793
tmm77 wants to merge 53 commits intotaichi-dev:masterfrom
ROCm:amd-integration

tmm77 commented Apr 15, 2026 •

edited by cursor Bot

Loading

Uh oh!

cursor Bot Apr 15, 2026

Uh oh!

cursor Bot Apr 15, 2026

Uh oh!

cursor Bot Apr 15, 2026

Uh oh!

cursor Bot Apr 15, 2026

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants


		machine_gen_gcn->registerPassBuilderCallbacks(module_gen_gcn_pass_manager);

		builder.run(*module_clone, MAM);

Conversation

tmm77 commented Apr 15, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Brief Summary

Uh oh!

cursor Bot Apr 15, 2026

Choose a reason for hiding this comment

DX12 intrinsic lowering pass lost on reassignment

Uh oh!

cursor Bot Apr 15, 2026

Choose a reason for hiding this comment

AMDGPU GCN output empty for LLVM 17+

Uh oh!

cursor Bot Apr 15, 2026

Choose a reason for hiding this comment

LLVM_DIR hardcoded to Linux path for all platforms

Uh oh!

cursor Bot Apr 15, 2026

Choose a reason for hiding this comment

Docs conf.py searches for nonexistent CMake function

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot May 4, 2026

Choose a reason for hiding this comment

AMDGPU target callbacks registered after pipeline is built

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

tmm77 commented Apr 15, 2026 •

edited by cursor Bot

Loading