[1/4]: FOO by slinder1 · Pull Request #53 · slinder1/llvm-project

slinder1 · 2026-02-27T15:25:14Z

Change-Id: I444f175dad034cb4350dc590dce158fc404b654b

Stack:

_{(Note: Closed and merged PRs may not be reflected here and PR numbering is not stable.)}

…lvm#181468) Reland llvm#174607 llvm#174607 broke libc++ because the LIBC_CONF_WCTYPE_MODE macro wasn't defined when called from libc++. Defaulted LIBC_CONF_WCTYPE_MODE to LIBC_WCTYPE_MODE_ASCII when not configured (llvm@ffd355b)

…vm#181467) closes: llvm#181466

The case where fir.select only has a "unit" block target (i.e., it is a switch with only the default case) was not handled correctly in codegen.

…lvm#178937) When a target region is placed inside a constant false condition (e.g., `if (.false.)`), the dead code gets eliminated on the host side, removing the `omp.target` operation entirely. However, the device-side compilation pipeline is unaware of this elimination and attempts to generate kernel code. Since the host never created offload metadata for the eliminated target, the device-side kernel function lacks the "kernel" attribute, causing `OpenMPOpt` to fail with an assertion when it expects all outlined kernels to have this attribute. The problem can be seen with the following code: ```fortran program cele implicit none real :: V integer :: i if (.false.) then !$omp target teams distribute parallel do do i = 1, 5 V = V * 2 end do !$omp end target teams distribute parallel do end if end program ``` It currently fails with the following assertion: ``` Assertion `omp::isOpenMPKernel(*Kernel) && "Expected kernel function!"' failed. llvm/lib/Transforms/IPO/OpenMPOpt.cpp:4291 ``` This PR adds `DeleteUnreachableTargetsPass` that identifies `omp.target` operations in unreachable code blocks and removes them.

Closes llvm#176476 Part of llvm#147386

Directly unroll VectorEndPointerRecipe following 0636225 ([VPlan] Directly unroll VectorPointerRecipe, llvm#168886). It allows us to leverage existing VPlan simplifications to optimize. Co-authored-by: Luke Lau <luke@igalia.com> Co-authored-by: Florian Hahn <flo@fhahn.com>

`opt -passes=polly-custom<detect>`, or `stopafter=detect` would still run the ScopInfo analysis even though it should run when explicitly enabled or required by another phase.

…181621) The test fails on targets that have a different LLVM IR lowering (e.g. RISC-V which produces `signext i32` for the return type). Rather than complicate the test with more complex patterns, just set the triple explicitly to x86-64 (as various other generic clang/test/CodeGen* tests do). Test was introduced by llvm#163666. This fixes RISC-V CI.

Add a ConstantExpr::getPtrAdd() API that creates a getelementptr i8 constant expression, similar to IRBuilder::CreatePtrAdd(). In the future this will create a ptradd expression.

std::equal(std::byte) currently has sub-optimal codegen due to enum types not being recognized as trivially equality comparable. In order to fix this we make them trivially comparable. In the process I factored out into a standalone function EqualityComparisonIsDefaulted and refactored the test cases. Enum types cannot have operator== which is a hidden friend. Fixes llvm#132672

…when floating point types are involved (llvm#181208) The backend was adding fp-rounding mode flags to `uchar convert_uchar_rte(uint)`. These builtins are equivalent to `uchar convert_uchar(uint)` which simply truncates its input, since there is no floating-point value involved. Related to llvm#180936 This is consistent with what was implemented in the translator in KhronosGroup/SPIRV-LLVM-Translator#3120 and KhronosGroup/SPIRV-LLVM-Translator#3128

@dvyukov

This commit introduces an "adaptive delay" feature to the ThreadSanitizer runtime to improve race detection by perturbing thread schedules. At various synchronization points (atomic operations, mutexes, and thread lifecycle events), the runtime may inject small delays (spin loops, yields, or sleeps) to explore different thread interleavings and expose data races that would otherwise occur only in rare execution orders. This change is inspired by prior work, which is discussed in more detail on https://discourse.llvm.org/t/rfc-tsan-implementing-a-fuzz-scheduler-for-tsan/80969. In short, https://reviews.llvm.org/D65383 was an earlier unmerged attempt at adding a random delays. Feedback on the RFC led to the version in this commit, aiming to limit the amount of delay. The adaptive delay feature uses a configurable time budget and tiered sampling strategy to balance race exposure against performance impact. It prioritizes high-value synchronization points with clear happens-before relationships: relaxed atomics receive lightweight spin delays with low sampling, synchronizing atomics (acquire / release / seq_cst) receive moderate delays with higher sampling, and mutex and thread lifecycle operations receive the longest delays with highest sampling. The feature is disabled by default and incurs minimal overhead when not enabled. Nearly all checks are guarded by an inline check on a global variable that is only set when enable_adaptive_delay=1. Microbenchmarks with tight loops of atomic operations showed no meaningful performance difference between an unmodified TSAN runtime and this version when running with empty TSAN_OPTIONS. An LLM assisted in writing portions of the adaptive delay logic, including the TimeBudget class, tiering concept, address sampler, and per-thread quota system. I reviewed the output and made amendments to reduce duplication and simplify the behavior. I also replaced the LLM's original double-based calculation logic with the integer-based Percent class. The LLM also helped write unit test cases for Percent. cc @dvyukov ## Examples I used the delay scheduler to find novel bugs that rarely or never occurred with the unmodified TSAN runtime. Some of the bugs below were found with earlier versions of the delay scheduler that I iterated on, but with this most recent implementation in this PR, I can still find the bugs far more reliably than with the standard TSAN runtime. - A use-after-free in the [BlazingMQ](https://github.com/bloomberg/blazingmq) broker during ungraceful producer disconnect. - Race in stdexec: NVIDIA/stdexec#1395 - Race in stdexec's MPSC queue: NVIDIA/stdexec#1812 - A few races in [BDE](https://github.com/bloomberg/bde) thread enabled data structures/algorithms. - The "Data race on variable a" test from https://ceur-ws.org/Vol-2344/paper9.pdf is more reliably reproduced with more aggressive adaptive scheduler options # Outstanding work - The [RFC](https://discourse.llvm.org/t/rfc-tsan-implementing-a-fuzz-scheduler-for-tsan/80969) suggests moving the scheduler to sanitizer_common, so that ASAN can leverage this. This should be done (should it be done in this PR?). - Missing interceptors for libdispatch

…lvm#181516) Currently it errors out due to FPRoundingMode misplacement.

DWARF to YAML optimizations: Add a lot of vector reserves & moves. See also WebAssembly/binaryen#8257 --------- Co-authored-by: stevenwdv <stevenwdv@users.noreply.github.com>

…egnerate (llvm#181631) The file had missing checks due to collisions and a lot of redundancy between x86/x64 and isa levels

These are no-ops.

This is a no-op.

…ode. NFC. (llvm#181637) isKnownToBeAPowerOfTwo has gotten to a size now that we should use a general switch like other value tracking helpers.

…n in tests (llvm#181638) This will fold to AND(NOT(X),1) in an upcoming fold, defeating the purpose of the repeated constant tests

As a followup to llvm#181365, this adds the `getInBoundsPtrAdd()` variant and updates code to use it.

…vm#180981) In OpenMP a canonical loop nest may be enclosed in a BLOCK construct. Specifically, the two loops below are considered to form a valid loop sequence: ```f90 do i = 1, n end do block do j = 1, m end do end block ``` Implement an extension to parser::Block::iterator that will treat the example above as ```f90 do i = 1, n end do do j = 1, m end do ``` that is, as if the BLOCK/ENDBLOCK statement were deleted. This will make the analysis of loop nests easier, since any such code will not have to deal with BLOCK constructs itself.

…vm#179122) `DenseElementsAttr` supports only a hard-coded list of element types: `int`, `index`, `float`, `complex`. This commit generalizes the `DenseElementsAttr` infrastructure: it now supports arbitrary element types, as long as they implement the new `DenseElementTypeInterface`. The `DenseElementTypeInterface` has the following helper functions: - `getDenseElementBitSize`: Query the size of an element in bits. (When storing an element in memory, each element is padded to a full byte. This is an existing limitation of the `DenseElementsAttr`; with an exception for `i1`.) - `convertToAttribute`: Attribute factory / deserializer. Converts bytes into an MLIR attribute. The attribute provides the assembly format / printer for a single element. - `convertFromAttribute`: Serializer. Converts an MLIR attribute into bytes. Note: `convertToAttribute` / `convertFromAttribute` are mainly for writing test cases. For performance reasons, `DenseElementsAttr` users should work with raw bytes / elements and avoid any API that materializes MLIR attributes. However, MLIR attributes typically have human-readable parsers/printers, making them suitable for lit tests and debugging. This PR introduces an additional assembly format for `DenseElementsAttrs`. There are now two formats. (The existing one is kept for compatibility reasons.) - Literal-first (existing): `dense<[1, 2, 3]> : tensor<3xi32>` - Type-first (new): `dense<tensor<3xi32> : [1 : i32, 2 : i32, 3 : i32]>` The new syntax is needed to disambiguate between "literal" (e.g., `1`) and attribute (e.g., `1 : i32`) when parsing the first token. In the literal-first syntax, we only parse literals. In the type-first syntax, we only parse attributes. The existing `int`, `index`, `float`, `complex` types also implement the `DenseElementTypeInterface`. This allows us to implement `DenseElementsAttr::get` and `AttributeElementIterator::operator*` in a generic way. RFC: https://discourse.llvm.org/t/rfc-allow-custom-element-types-in-denseelementattr/89656

…lvm#181609) MemCopyOptimizer was merging a store instruction into a memset of an `undef` value and emitted a large memset for the memset and store combined. This PR prevents MemCopyOptimizer from merging a store instruction into a memset of an `undef` value since it can be removed by subsequent cleanup passes. Helps rust-lang/rust#152541

llvm#181655) The instrument-ind-call test checks the correctness of instrumented snippet by the set of registers are used, the call id value is meaningless (platform depend) and should be exclude from test.

Suppress ADL on Blocks runtime calls in std::function.

This is a reland of llvm#167550. Instead of relying on libcpp for testing, we emulate our own hidden frames. This was originally causing tests failures on Windows.

Replace them with `SPIRVTypeInst`.

…ge to spirv-link (llvm#181870) Without this flag a lot of tests error in the linker. --------- Signed-off-by: Nick Sarnie <nick.sarnie@intel.com>

This PR adds WavePrefixProduct intrinsic support in HLSL with codegen for both DirectX and SPIRV backends. Resolves llvm#99173. - [x] Implement `WavePrefixProduct` clang builtin - [x] Link `WavePrefixProduct` clang builtin with `hlsl_intrinsics.h` - [x] Add sema checks for `WavePrefixProduct` to `CheckHLSLBuiltinFunctionCall` in `SemaChecking.cpp` - [x] Add codegen for `WavePrefixProduct` to `EmitHLSLBuiltinExpr` in `CGBuiltin.cpp` - [x] Add codegen tests to `clang/test/CodeGenHLSL/builtins/WavePrefixProduct.hlsl` - [x] Add sema tests to `clang/test/SemaHLSL/BuiltIns/WavePrefixProduct-errors.hlsl` - [x] Create the `int_dx_WavePrefixProduct` intrinsic in `IntrinsicsDirectX.td` - [x] Create the `DXILOpMapping` of `int_dx_WavePrefixProduct` to `121` in `DXIL.td` - [x] Create the `WavePrefixProduct.ll` and `WavePrefixProduct_errors.ll` tests in `llvm/test/CodeGen/DirectX/` - [x] Create the `int_spv_WavePrefixProduct` intrinsic in `IntrinsicsSPIRV.td` - [x] In `SPIRVInstructionSelector.cpp` create the `WavePrefixProduct` lowering and map it to `int_spv_WavePrefixProduct` in `SPIRVInstructionSelector::selectIntrinsic`. - [x] Create SPIR-V backend test case in `llvm/test/CodeGen/SPIRV/hlsl-intrinsics/WavePrefixProduct.ll`

Fixes llvm#181626

…181927)

…#181926) Reverts llvm#181367 This is causing crashes in tests on a Windows bot https://lab.llvm.org/buildbot/#/builders/46/builds/30854.

Reverts llvm#181261 Breaking builds on linux, reverting while I investigate. See https://lab.llvm.org/buildbot/#/builders/181/builds/37346

…baremetal" (llvm#181931) Reverts llvm#175530 This PR breaks libc header generation on Windows stage2 builds for a to be determined reason.

…llvm#181731 (llvm#181914) This test creates an invalid vector cost, but llvm#181731 allows transforming of lshr 0, 0 -> add 0, 0 which in turn allows costing of the following TreeEntry since is will be considered as 4 `ADD` operations. ``` %lshr.1 = lshr i96 0, 0 %lshr.2 = lshr i96 0, 0 %add.0 = add i96 0, 0 %add.1 = add i96 0, 0 ``` This commit adjusts the operands to ensure an invalid cost is still generated after llvm#181731. This test was originally added in 4652ec0.

…lvm#181925) Tests for llvm#181731.

* Add job responsible for bisecting commits on the CI environment

When creating loops to lower some AMX intrinsics, it is often the case we have enough information to synthesize profile metadata for the latch. This patch makes it so that we either set branch weights if everything is a known constant, or set unknown weights if we do not have constants. Reviewers: jdenny-ornl, mtrofin, phoebewang, KanRobert, RKSimon Pull Request: llvm#181578

…#181869) I noticed this was lacking while reviewing llvm#175800

… C++20 (llvm#181928) After 5d5301d, lldb fails to compile under C++20 with: ``` llvm-project/lldb/source/Target/StackFrameList.cpp:975:12: error: no viable conversion from returned value of type 'const char8_t *' to function return type 'std::string' (aka 'basic_string<char>') 975 | return show_unicode_marker ? u8" * " : u8"* "; | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ``` This PR reinterpret casts the unicode characters returned by `GetFrameMarker` to `const char*`.

…1547) This is an NFC change to make room for a more generalized "prepare" pass for inline assembly beyond CallBrInsts. In particular, changing how we generate code for inline assembly with "rm" constraints.

…ted_forall_to_threads` (llvm#170282) Harden `transform.gpu.map_nested_forall_to_threads` to reject non-positive block/grid sizes and handle zero-iteration dimensions gracefully, preventing assertion failures in `computeProduct`. Fix `getConstantIntValues` to return `std::nullopt` if any element is non-constant, avoiding invalid zero placeholders. Fixes: llvm#73562

…ace(2/3/5)` in AMDGPU tests (llvm#181710)

…81892) - Rename `getTiedOperandIdx` to `GetTiedOperandIdx` per LLVM CS. - Do not compute tied operand index for defs, since tied operands are printed only on uses. - Restructure the `if` in the later operand printing loop to not compute tied operand index/type for subreg-index imm operands.

To facilitate codegen decisions, we need to create an operation that can abstract the final update of the original and partial sum from a reduction. This is represented within the combiner recipes. Having an operator allows future lowering to clearly identify how to handle the final accumulation. This is currently an NFC. The format of this operation is: ``` acc.reduction_combine %srcMemref into %destMemref <reductionOperator> : type ```

…m#179121) This adds support for FP environment descriptions and RAII options for FP operations, i.e.,`CIRGenFPOptionsRAII`).

This adds a verifier to enforce the requirement that every catch handler in a cir.try operation must begin with a cir.catch_param operation.

After 8d971c0, there is a linked list container object called MatcherList. We no long hold a pointer directly to the first Matcher in the list. Rename the variables to make this clearer.

Check that the error location points to the destination operand. I'm planning to rewrite the code that generates that error, and I want to make sure I get the location right.

This revert llvm#181334 and its follow-up PRs (including llvm#181488, llvm#181492, llvm#181493, llvm#181494 and llvm#181498) as well as Ismail's documentation changes (llvm#181594, llvm#181717). The original commit causes a test failure in CI (llvm#181938) but the more I look at the patch, the more I'm convinced it was not ready to land. It will be easier to iterate on the feedback by re-landing this than by using post-commit review.

Change-Id: I444f175dad034cb4350dc590dce158fc404b654b

mleleszi and others added 30 commits February 16, 2026 10:16

[ARM] Move MVE test into the correct place. NFC

bde3ef4

[libc][math] Refactor canonicalize function family to header-only (ll…

e6fdcf3

…vm#181467) closes: llvm#181466

[flang] fix codegen of fir.select with only default case (llvm#181373)

3f0f834

The case where fir.select only has a "unit" block target (i.e., it is a switch with only the default case) was not handled correctly in codegen.

[libc][math] Refactor sinpif16 to header only. (llvm#178503)

054021d

Closes llvm#176476 Part of llvm#147386

[Polly] Honor 'scops' phase being disabled (llvm#180380)

3272ba7

`opt -passes=polly-custom<detect>`, or `stopafter=detect` would still run the ScopInfo analysis even though it should run when explicitly enabled or required by another phase.

[IR] Add ConstantExpr::getPtrAdd() (llvm#181365)

b205396

Add a ConstantExpr::getPtrAdd() API that creates a getelementptr i8 constant expression, similar to IRBuilder::CreatePtrAdd(). In the future this will create a ptradd expression.

[NFC][SPIRV] Disable spirv-val in tests for constrained intrinsics (l…

339e200

…lvm#181516) Currently it errors out due to FPRoundingMode misplacement.

dwarf2yaml.cpp optimizations (llvm#179048)

ef86449

DWARF to YAML optimizations: Add a lot of vector reserves & moves. See also WebAssembly/binaryen#8257 --------- Co-authored-by: stevenwdv <stevenwdv@users.noreply.github.com>

[X86] broadcast-elm-cross-splat-vec.ll - cleanup check prefixes and r…

2c194a1

…egnerate (llvm#181631) The file had missing checks due to collisions and a lot of redundancy between x86/x64 and isa levels

[GCOVProfiling] Remove unnecessary zero-index GEPs

37adb1d

These are no-ops.

[OffloadWrapper] Remove unnecessary zero-index GEPs (llvm#181632)

dc34267

These are no-ops.

[OpenMPIRBuilderTest] Remove unnecessary zero-index GEP

e216dc2

This is a no-op.

[DAG] isKnownToBeAPowerOfTwo - use switch() to match against each opc…

20aff20

…ode. NFC. (llvm#181637) isKnownToBeAPowerOfTwo has gotten to a size now that we should use a general switch like other value tracking helpers.

[X86] broadcast-elm-cross-splat-vec.ll - avoid AND(ADD(X,1),1) patter…

a250eda

…n in tests (llvm#181638) This will fold to AND(NOT(X),1) in an upcoming fold, defeating the purpose of the repeated constant tests

[IR] Add ConstantExpr::getInBoundsPtrAdd() (llvm#181639)

aef9959

As a followup to llvm#181365, this adds the `getInBoundsPtrAdd()` variant and updates code to use it.

[AMDGPU] Regenerate test checks (NFC)

3765b09

[bolt][nfc] Exclude Call id verification from instrument-ind-call test (

db19a57

llvm#181655) The instrument-ind-call test checks the correctness of instrumented snippet by the set of registers are used, the call id value is meaningless (platform depend) and should be exclude from test.

[libc++] Prevent ADL on _Block_copy/_Block_release (llvm#179614)

2bee460

Suppress ADL on Blocks runtime calls in std::function.

[lldb] add a marker around hidden frames (llvm#181143)

5d5301d

This is a reland of llvm#167550. Instead of relying on libcpp for testing, we emulate our own hidden frames. This was originally causing tests failures on Windows.

[NFC][SPIRV] Remove uses of SPIRVType in SPIRVUtils (llvm#181663)

6288248

Replace them with `SPIRVTypeInst`.

sarnex and others added 25 commits February 17, 2026 22:18

[clang][Driver][SPIRV][ClangLinkerWrapper] Pass --allow-partial-linka…

1870f3f

…ge to spirv-link (llvm#181870) Without this flag a lot of tests error in the linker. --------- Signed-off-by: Nick Sarnie <nick.sarnie@intel.com>

[libc][math] Refactored bf16fmaf to Header Only (llvm#181919)

61616f2

Fixes llvm#181626

[flang][cuda][NFC] Move set/get default stream to its own file (llvm#…

786b3b4

…181927)

Revert "[clang] Fix some static initialization race-conditions" (llvm…

d4b5742

…#181926) Reverts llvm#181367 This is causing crashes in tests on a Windows bot https://lab.llvm.org/buildbot/#/builders/46/builds/30854.

Revert "[lldb-dap] Validate utf8 protocol messages." (llvm#181930)

977d910

Reverts llvm#181261 Breaking builds on linux, reverting while I investigate. See https://lab.llvm.org/buildbot/#/builders/181/builds/37346

Revert "[libc] Add getc, ungetc, fflush to enable libc++ iostream on …

9ff444c

…baremetal" (llvm#181931) Reverts llvm#175530 This PR breaks libc header generation on Windows stage2 builds for a to be determined reason.

[SLP][NFC] Precommit tests for LShr-UDiv power of 2 transformations (l…

62f4f2f

…lvm#181925) Tests for llvm#181731.

[green dragon] add bisection job (llvm#181883)

e64eed2

* Add job responsible for bisecting commits on the CI environment

[lld][WebAssembly] Add comment regarding DataCount section. NFC (llvm…

8240831

…#181869) I noticed this was lacking while reviewing llvm#175800

[RISCV] Add combines to form WSUBAU on RV32 with P. (llvm#181604)

2cb342c

[NFC][CodeGen] Rename CallBrPrepare pass to InlineAsmPrepare (llvm#18…

9a0d65c

…1547) This is an NFC change to make room for a more generalized "prepare" pass for inline assembly beyond CallBrInsts. In particular, changing how we generate code for inline assembly with "rm" constraints.

[NFC][AMDGPU] Use zeroinitializer instead of null for `ptr addrsp…

90d1a55

…ace(2/3/5)` in AMDGPU tests (llvm#181710)

[CIR][NFC] Upstream support for FP environments and RAII options (llv…

260f6fe

…m#179121) This adds support for FP environment descriptions and RAII options for FP operations, i.e.,`CIRGenFPOptionsRAII`).

[CIR] Add verifier for CIR try op (llvm#181419)

0fc9b74

This adds a verifier to enforce the requirement that every catch handler in a cir.try operation must begin with a cir.catch_param operation.

[TableGen] Rename TheMatcher->TheMatcherList. NFC (llvm#181942)

54f3b39

After 8d971c0, there is a linked list container object called MatcherList. We no long hold a pointer directly to the first Matcher in the list. Rename the variables to make this clearer.

[RISCV] Check the error location in xsfvcp-invalid.s. NFC (llvm#181929)

3fc48b7

Check that the error location points to the destination operand. I'm planning to rewrite the code that generates that error, and I want to make sure I get the location right.

FOO

c1e5ba5

Change-Id: I444f175dad034cb4350dc590dce158fc404b654b

slinder1 changed the title ~~FOO~~ [1/4]: FOO Feb 27, 2026

This was referenced Feb 27, 2026

[2/4]: BAR #52

Open

[4/4]: QUX #55

Open

slinder1 marked this pull request as ready for review February 27, 2026 15:25

slinder1 mentioned this pull request Feb 27, 2026

[3/4]: BAZ #54

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[1/4]: FOO#53

[1/4]: FOO#53
slinder1 wants to merge 705 commits into
mainfrom
users/slinder1/I444f175dad034cb4350dc590dce158fc404b654b

slinder1 commented Feb 27, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

slinder1 commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

slinder1 commented Feb 27, 2026 •

edited

Loading