Skip to content

spirv: add LLVM SPIR-V backend crash fuzzer#1

Closed
aobolensk wants to merge 2 commits into
SemiAnalysisAI:masterfrom
aobolensk:spirv-fuzzer
Closed

spirv: add LLVM SPIR-V backend crash fuzzer#1
aobolensk wants to merge 2 commits into
SemiAnalysisAI:masterfrom
aobolensk:spirv-fuzzer

Conversation

@aobolensk
Copy link
Copy Markdown

@aobolensk aobolensk commented May 27, 2026

Note

Low Risk
Adds a self-contained fuzzing subdirectory and demo tooling; no changes to existing fuzzers or production compiler paths in-repo.

Overview
Adds a new spirv/ tree to FuzzX: a crash-only, in-process libFuzzer harness for LLVM’s SPIR-V backend (spirv64), documented as an AMDGPU-style layout without differential GPU execution.

The harness llvm_spirv_crash_fuzzer parses mutated LLVM bitcode (with a spir_kernel skeleton fallback), pins spirv64-unknown-unknown, runs O0 and O2 optimize+codegen under CrashRecoveryContext, routes LLVM fatals to abort, and writes .bc/.ll findings to FUZZX_FINDINGS_DIR. Supporting pieces mirror AMDGPU: CMake (SPIR-V codegen libs, libFuzzer sanitizers), build_instrumented_llvm.sh (SPIRV;X86, assertions, sancov), build_directed_fuzzer.sh, run_directed_fuzzer.sh, and seed_ir_corpus.sh. The root README gains a spirv/ entry; spirv/.gitignore ignores build/runtime/corpus/findings.

The SPIR-V README notes a smoke-run MRI reserved-regs assertion that may not repro under standalone llc (possible in-process/harness interaction).

Reviewed by Cursor Bugbot for commit 2e17d2e. Bugbot is set up for automated code reviews on this repo. Configure here.

@aobolensk
Copy link
Copy Markdown
Author

@jlebar please, clarify if that aligns with your project goals as well

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f8e915e129

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +326 to +327
if (!validateIRCorpusModule(*M))
return 0;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Verify parsed IR before running LLVM passes

When a byte-mutated bitcode file still parses but violates verifier invariants, validateIRCorpusModule() returns true as long as the module is non-empty and the function calling conventions are allowed. The next step runs the O0/O2 optimization and codegen pipelines on that invalid IR, and LLVM passes are allowed to assert or abort on malformed input, so the fuzzer can report false compiler-crash findings that are not SPIR-V backend bugs. Please run the LLVM verifier on the parsed module before optimization and discard verifier failures.

Useful? React with 👍 / 👎.

return createIRSkeletonModule(Ctx, CPU);
// Force the triple so corpus mutation of target metadata cannot send us to
// a different backend.
Parsed->setTargetTriple(Triple(DefaultTriple));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reset mutated data layouts before optimization

For parsed corpus inputs, this forces the triple back to SPIR-V but leaves whatever target datalayout bytes survived mutation in the module until emitObject() resets it after the optimization pipeline has already run. If a valid bitcode input carries a stale or corrupted data layout, target-dependent LLVM passes optimize using pointer sizes/address-space rules that do not match the SPIR-V target, which can produce harness-only crashes or misleading findings. Reset the module data layout from the SPIR-V TargetMachine before running the pass pipeline.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit f8e915e. Configure here.

Comment thread spirv/fuzzer/llvm_spirv_crash_fuzzer.cpp Outdated
Comment thread spirv/fuzzer/llvm_spirv_crash_fuzzer.cpp
@jlebar
Copy link
Copy Markdown
Collaborator

jlebar commented May 27, 2026

Hi, thanks for the PR!

@jlebar please, clarify if that aligns with your project goals as well

I think my inclination is to say that if you want to build your own fuzzer, I'll happily link to it from the main README? Right now e.g. I have some of my personal capital on the line when I tell AMD and nvidia that I've found real bugs in their compiler, and for this reason I hesitate at the idea of letting strangers contribute to the list of bugs (without verifying them as real myself, which I don't have resources to do rn).

Also, the fact that this is a crash-only fuzzer makes it much less interesting to me, personally.

@aobolensk
Copy link
Copy Markdown
Author

@jlebar thank you for the feedback!

As for contributing to the list of bugs, that's completely understandable. That is difficult to keep track of them this way. So I didn't add any new bugs here.

As for the fact that it is crash-only, it is quite easily extendable. There is a tool called spirv-val that is checking for the IR compliance with the spec. It can be added. Although I have found some bugs already using the current state of the fuzzer and, as I said before, I didn't commit them here, I am rather currently adding fixes directly to LLVM to fix them there.

If you don't want to have that functionality here, that is ok, I think we will use them internally as a fuzzer for SPIR-V backend and maintain in the fork.

@jlebar
Copy link
Copy Markdown
Collaborator

jlebar commented May 27, 2026

I think yeah, why don't you make a separate project (doesn't even have to be a "fork", you're not sharing anything with this project) and I'll link to it from the main README if you like? If we were sharing resources that'd be one reason to keep things in the same repo, but everything is totally independent.

@aobolensk
Copy link
Copy Markdown
Author

I think yeah, why don't you make a separate project (doesn't even have to be a "fork", you're not sharing anything with this project) and I'll link to it from the main README if you like? If we were sharing resources that'd be one reason to keep things in the same repo, but everything is totally independent.

Thanks for the suggestion! It makes sense. I created one: https://github.com/aobolensk/FuzzX-spirv

Feel free to credit it if you want.

@jlebar
Copy link
Copy Markdown
Collaborator

jlebar commented May 27, 2026

(You probably shouldn't call it FuzzX because SemiAnalysis could reasonably claim trademark on the name?)

@aobolensk
Copy link
Copy Markdown
Author

(You probably shouldn't call it FuzzX because SemiAnalysis could reasonably claim trademark on the name?)

That'd be strange, but ok, renamed

@jlebar
Copy link
Copy Markdown
Collaborator

jlebar commented May 27, 2026

Added a link to your repo in 41c1891. Good luck!

@jlebar jlebar closed this May 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants