Skip to content

[1/4]: FOO#53

Open
slinder1 wants to merge 705 commits into
mainfrom
users/slinder1/I444f175dad034cb4350dc590dce158fc404b654b
Open

[1/4]: FOO#53
slinder1 wants to merge 705 commits into
mainfrom
users/slinder1/I444f175dad034cb4350dc590dce158fc404b654b

Conversation

@slinder1

@slinder1 slinder1 commented Feb 27, 2026

Copy link
Copy Markdown
Owner

Change-Id: I444f175dad034cb4350dc590dce158fc404b654b


Stack:

(Note: Closed and merged PRs may not be reflected here and PR numbering is not stable.)

mleleszi and others added 30 commits February 16, 2026 10:16
…lvm#181468)

Reland llvm#174607

llvm#174607 broke libc++ because the LIBC_CONF_WCTYPE_MODE macro wasn't
defined when called from libc++. Defaulted LIBC_CONF_WCTYPE_MODE to
LIBC_WCTYPE_MODE_ASCII when not configured
(llvm@ffd355b)
The case where fir.select only has a "unit" block target (i.e., it is a
switch with only the default case) was not handled correctly in codegen.
…lvm#178937)

When a target region is placed inside a constant false condition (e.g.,
`if (.false.)`), the dead code gets eliminated on the host side,
removing the `omp.target` operation entirely. However, the device-side
compilation pipeline is unaware of this elimination and attempts to
generate kernel code. Since the host never created offload metadata for
the eliminated target, the device-side kernel function lacks the
"kernel" attribute, causing `OpenMPOpt` to fail with an assertion when
it expects all outlined kernels to have this attribute. The problem can
be seen with the following code:

```fortran
program cele
  implicit none
  real :: V
  integer :: i
  if (.false.) then
    !$omp target teams distribute parallel do
    do i = 1, 5
      V = V * 2
    end do
    !$omp end target teams distribute parallel do
  end if
end program
```

It currently fails with the following assertion:

```
Assertion `omp::isOpenMPKernel(*Kernel) && "Expected kernel function!"' failed.
llvm/lib/Transforms/IPO/OpenMPOpt.cpp:4291
```

This PR adds `DeleteUnreachableTargetsPass` that identifies `omp.target`
operations in unreachable code blocks and removes them.
Directly unroll VectorEndPointerRecipe following 0636225 ([VPlan]
Directly unroll VectorPointerRecipe, llvm#168886). It allows us to leverage
existing VPlan simplifications to optimize.

Co-authored-by: Luke Lau <luke@igalia.com>
Co-authored-by: Florian Hahn <flo@fhahn.com>
`opt -passes=polly-custom<detect>`, or `stopafter=detect` would still
run the ScopInfo analysis even though it should run when explicitly
enabled or required by another phase.
…181621)

The test fails on targets that have a different LLVM IR lowering (e.g.
RISC-V which produces `signext i32` for the return type). Rather than
complicate the test with more complex patterns, just set the triple
explicitly to x86-64 (as various other generic clang/test/CodeGen* tests
do).

Test was introduced by llvm#163666.

This fixes RISC-V CI.
Add a ConstantExpr::getPtrAdd() API that creates a getelementptr i8
constant expression, similar to IRBuilder::CreatePtrAdd(). In the future
this will create a ptradd expression.
std::equal(std::byte) currently has sub-optimal codegen due to enum
types not being recognized as trivially equality comparable. In order to
fix this we make them trivially comparable. In the process I factored
out into a standalone function EqualityComparisonIsDefaulted and
refactored the test cases.

Enum types cannot have operator== which is a hidden friend.

Fixes llvm#132672
…when floating point types are involved (llvm#181208)

The backend was adding fp-rounding mode flags to
`uchar convert_uchar_rte(uint)`. These builtins are equivalent to `uchar
convert_uchar(uint)` which simply truncates its input, since there is no
floating-point value involved.

Related to llvm#180936

This is consistent with what was implemented in the translator in
KhronosGroup/SPIRV-LLVM-Translator#3120 and
KhronosGroup/SPIRV-LLVM-Translator#3128
This commit introduces an "adaptive delay" feature to the
ThreadSanitizer runtime to improve race detection by perturbing thread
schedules. At various synchronization points (atomic operations,
mutexes, and thread lifecycle events), the runtime may inject small
delays (spin loops, yields, or sleeps) to explore different thread
interleavings and expose data races that would otherwise occur only in
rare execution orders.

This change is inspired by prior work, which is discussed in more detail
on

https://discourse.llvm.org/t/rfc-tsan-implementing-a-fuzz-scheduler-for-tsan/80969.
In short, https://reviews.llvm.org/D65383 was an earlier unmerged
attempt at adding a random delays. Feedback on the RFC led to the
version in this commit, aiming to limit the amount of delay.

The adaptive delay feature uses a configurable time budget and tiered
sampling strategy to balance race exposure against performance impact.
It prioritizes high-value synchronization points with clear
happens-before relationships: relaxed atomics receive lightweight spin
delays with low sampling, synchronizing atomics (acquire / release /
seq_cst) receive moderate delays with higher sampling, and mutex and
thread lifecycle operations receive the longest delays with highest
sampling.

The feature is disabled by default and incurs minimal overhead when not
enabled. Nearly all checks are guarded by an inline check on a global
variable that is only set when enable_adaptive_delay=1. Microbenchmarks
with tight loops of atomic operations showed no meaningful performance
difference between an unmodified TSAN runtime and this version when
running with empty TSAN_OPTIONS.

An LLM assisted in writing portions of the adaptive delay logic,
including the TimeBudget class, tiering concept, address sampler, and
per-thread quota system. I reviewed the output and made amendments to
reduce duplication and simplify the behavior. I also replaced the LLM's
original double-based calculation logic with the integer-based Percent
class. The LLM also helped write unit test cases for Percent.

cc @dvyukov 

## Examples

I used the delay scheduler to find novel bugs that rarely or never
occurred with the unmodified TSAN runtime. Some of the bugs below were
found with earlier versions of the delay scheduler that I iterated on,
but with this most recent implementation in this PR, I can still find
the bugs far more reliably than with the standard TSAN runtime.

- A use-after-free in the
[BlazingMQ](https://github.com/bloomberg/blazingmq) broker during
ungraceful producer disconnect.
 - Race in stdexec: NVIDIA/stdexec#1395
- Race in stdexec's MPSC queue:
NVIDIA/stdexec#1812
- A few races in [BDE](https://github.com/bloomberg/bde) thread enabled
data structures/algorithms.
- The "Data race on variable a" test from
https://ceur-ws.org/Vol-2344/paper9.pdf is more reliably reproduced with
more aggressive adaptive scheduler options

# Outstanding work

- The
[RFC](https://discourse.llvm.org/t/rfc-tsan-implementing-a-fuzz-scheduler-for-tsan/80969)
suggests moving the scheduler to sanitizer_common, so that ASAN can
leverage this. This should be done (should it be done in this PR?).
 - Missing interceptors for libdispatch
…lvm#181516)

Currently it errors out due to FPRoundingMode misplacement.
DWARF to YAML optimizations: Add a lot of vector reserves & moves.

See also WebAssembly/binaryen#8257

---------

Co-authored-by: stevenwdv <stevenwdv@users.noreply.github.com>
…egnerate (llvm#181631)

The file had missing checks due to collisions and a lot of redundancy between x86/x64 and isa levels
…ode. NFC. (llvm#181637)

isKnownToBeAPowerOfTwo has gotten to a size now that we should use a general switch like other value tracking helpers.
…n in tests (llvm#181638)

This will fold to AND(NOT(X),1) in an upcoming fold, defeating the
purpose of the repeated constant tests
As a followup to llvm#181365, this
adds the `getInBoundsPtrAdd()` variant and updates code to use it.
…vm#180981)

In OpenMP a canonical loop nest may be enclosed in a BLOCK construct.
Specifically, the two loops below are considered to form a valid loop
sequence:
```f90
  do i = 1, n
  end do
  block
    do j = 1, m
    end do
  end block
```
Implement an extension to parser::Block::iterator that will treat the
example above as
```f90
  do i = 1, n
  end do
  do j = 1, m
  end do
```
that is, as if the BLOCK/ENDBLOCK statement were deleted. This will make
the analysis of loop nests easier, since any such code will not have to
deal with BLOCK constructs itself.
…vm#179122)

`DenseElementsAttr` supports only a hard-coded list of element types:
`int`, `index`, `float`, `complex`. This commit generalizes the
`DenseElementsAttr` infrastructure: it now supports arbitrary element
types, as long as they implement the new `DenseElementTypeInterface`.

The `DenseElementTypeInterface` has the following helper functions:
- `getDenseElementBitSize`: Query the size of an element in bits. (When
storing an element in memory, each element is padded to a full byte.
This is an existing limitation of the `DenseElementsAttr`; with an
exception for `i1`.)
- `convertToAttribute`: Attribute factory / deserializer. Converts bytes
into an MLIR attribute. The attribute provides the assembly format /
printer for a single element.
- `convertFromAttribute`: Serializer. Converts an MLIR attribute into
bytes.

Note: `convertToAttribute` / `convertFromAttribute` are mainly for
writing test cases. For performance reasons, `DenseElementsAttr` users
should work with raw bytes / elements and avoid any API that
materializes MLIR attributes. However, MLIR attributes typically have
human-readable parsers/printers, making them suitable for lit tests and
debugging.

This PR introduces an additional assembly format for
`DenseElementsAttrs`. There are now two formats. (The existing one is
kept for compatibility reasons.)
- Literal-first (existing): `dense<[1, 2, 3]> : tensor<3xi32>`
- Type-first (new): `dense<tensor<3xi32> : [1 : i32, 2 : i32, 3 : i32]>`

The new syntax is needed to disambiguate between "literal" (e.g., `1`)
and attribute (e.g., `1 : i32`) when parsing the first token. In the
literal-first syntax, we only parse literals. In the type-first syntax,
we only parse attributes.

The existing `int`, `index`, `float`, `complex` types also implement the
`DenseElementTypeInterface`. This allows us to implement
`DenseElementsAttr::get` and `AttributeElementIterator::operator*` in a
generic way.

RFC:
https://discourse.llvm.org/t/rfc-allow-custom-element-types-in-denseelementattr/89656
…lvm#181609)

MemCopyOptimizer was merging a store instruction into a memset of an
`undef` value and emitted a large memset for the memset and store
combined.

This PR prevents MemCopyOptimizer from merging a store instruction into
a
memset of an `undef` value since it can be removed by subsequent cleanup
passes.

Helps rust-lang/rust#152541
llvm#181655)

The instrument-ind-call test checks the correctness of instrumented
snippet by the set of registers are used, the call id value is
meaningless (platform depend) and should be exclude from test.
Suppress ADL on Blocks runtime calls in std::function.
This is a reland of llvm#167550.
Instead of relying on libcpp for testing, we emulate our own hidden
frames. This was originally causing tests failures on Windows.
sarnex and others added 25 commits February 17, 2026 22:18
…ge to spirv-link (llvm#181870)

Without this flag a lot of tests error in the linker.

---------

Signed-off-by: Nick Sarnie <nick.sarnie@intel.com>
This PR adds WavePrefixProduct intrinsic support in HLSL with codegen
for both DirectX and SPIRV backends. Resolves
llvm#99173.

- [x] Implement `WavePrefixProduct` clang builtin
- [x] Link `WavePrefixProduct` clang builtin with `hlsl_intrinsics.h`
- [x] Add sema checks for `WavePrefixProduct` to
`CheckHLSLBuiltinFunctionCall` in `SemaChecking.cpp`
- [x] Add codegen for `WavePrefixProduct` to `EmitHLSLBuiltinExpr` in
`CGBuiltin.cpp`
- [x] Add codegen tests to
`clang/test/CodeGenHLSL/builtins/WavePrefixProduct.hlsl`
- [x] Add sema tests to
`clang/test/SemaHLSL/BuiltIns/WavePrefixProduct-errors.hlsl`
- [x] Create the `int_dx_WavePrefixProduct` intrinsic in
`IntrinsicsDirectX.td`
- [x] Create the `DXILOpMapping` of `int_dx_WavePrefixProduct` to `121`
in `DXIL.td`
- [x] Create the `WavePrefixProduct.ll` and
`WavePrefixProduct_errors.ll` tests in `llvm/test/CodeGen/DirectX/`
- [x] Create the `int_spv_WavePrefixProduct` intrinsic in
`IntrinsicsSPIRV.td`
- [x] In `SPIRVInstructionSelector.cpp` create the `WavePrefixProduct`
lowering and map it to `int_spv_WavePrefixProduct` in
`SPIRVInstructionSelector::selectIntrinsic`.
- [x] Create SPIR-V backend test case in
`llvm/test/CodeGen/SPIRV/hlsl-intrinsics/WavePrefixProduct.ll`
…baremetal" (llvm#181931)

Reverts llvm#175530

This PR breaks libc header generation on Windows stage2 builds for a to
be determined reason.
…llvm#181731 (llvm#181914)

This test creates an invalid vector cost, but llvm#181731 allows
transforming of lshr 0, 0 -> add 0, 0 which in turn allows costing of
the following TreeEntry since is will be considered as 4 `ADD`
operations.
```
%lshr.1 = lshr i96 0, 0
%lshr.2 = lshr i96 0, 0
%add.0 = add i96 0, 0
%add.1 = add i96 0, 0
```
This commit adjusts the operands to ensure an invalid cost is still
generated after llvm#181731.

This test was originally added in
4652ec0.
* Add job responsible for bisecting commits on the CI environment
When creating loops to lower some AMX intrinsics, it is often the case
we have enough information to synthesize profile metadata for the latch.
This patch makes it so that we either set branch weights if everything
is a known constant, or set unknown weights if we do not have constants.

Reviewers: jdenny-ornl, mtrofin, phoebewang, KanRobert, RKSimon

Pull Request: llvm#181578
… C++20 (llvm#181928)

After 5d5301d, lldb fails to compile under C++20 with:

```
llvm-project/lldb/source/Target/StackFrameList.cpp:975:12: error: no viable conversion from returned value of type 'const char8_t *' to function return type 'std::string' (aka 'basic_string<char>')
  975 |     return show_unicode_marker ? u8" * " : u8"* ";
      |            ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
```

This PR reinterpret casts the unicode characters returned by
`GetFrameMarker` to `const char*`.
…1547)

This is an NFC change to make room for a more generalized "prepare" pass
for inline assembly beyond CallBrInsts. In particular, changing how we
generate code for inline assembly with "rm" constraints.
…ted_forall_to_threads` (llvm#170282)

Harden `transform.gpu.map_nested_forall_to_threads` to reject
non-positive block/grid sizes and handle zero-iteration dimensions
gracefully, preventing assertion failures in `computeProduct`.

Fix `getConstantIntValues` to return `std::nullopt` if any element is
non-constant, avoiding invalid zero placeholders.

Fixes: llvm#73562
…81892)

- Rename `getTiedOperandIdx` to `GetTiedOperandIdx` per LLVM CS.
- Do not compute tied operand index for defs, since tied operands are
printed only on uses.
- Restructure the `if` in the later operand printing loop to not compute
tied operand index/type for subreg-index imm operands.
To facilitate codegen decisions, we need to create an operation that can
abstract the final update of the original and partial sum from a
reduction. This is represented within the combiner recipes. Having an
operator allows future lowering to clearly identify how to handle the
final accumulation. This is currently an NFC.

The format of this operation is:

```
  acc.reduction_combine %srcMemref into %destMemref <reductionOperator> : type
```
…m#179121)

This adds support for FP environment descriptions and RAII options for
FP operations, i.e.,`CIRGenFPOptionsRAII`).
This adds a verifier to enforce the requirement that every catch handler
in a cir.try operation must begin with a cir.catch_param operation.
After 8d971c0, there is a linked list
container object called MatcherList. We no long hold a pointer directly
to the first Matcher in the list.

Rename the variables to make this clearer.
Check that the error location points to the destination operand.

I'm planning to rewrite the code that generates that error, and I want
to make sure I get the location right.
This revert llvm#181334 and its follow-up PRs (including llvm#181488, llvm#181492,
llvm#181493, llvm#181494 and llvm#181498) as well as Ismail's documentation changes
(llvm#181594, llvm#181717). The original commit causes a test failure in CI
(llvm#181938) but the more I look
at the patch, the more I'm convinced it was not ready to land. It will
be easier to iterate on the feedback by re-landing this than by using
post-commit review.
Change-Id: I444f175dad034cb4350dc590dce158fc404b654b
@slinder1 slinder1 changed the title FOO [1/4]: FOO Feb 27, 2026
This was referenced Feb 27, 2026
@slinder1 slinder1 marked this pull request as ready for review February 27, 2026 15:25
@slinder1 slinder1 mentioned this pull request Feb 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.