[1/4]: FOO#53
Open
slinder1 wants to merge 705 commits into
Open
Conversation
…lvm#181468) Reland llvm#174607 llvm#174607 broke libc++ because the LIBC_CONF_WCTYPE_MODE macro wasn't defined when called from libc++. Defaulted LIBC_CONF_WCTYPE_MODE to LIBC_WCTYPE_MODE_ASCII when not configured (llvm@ffd355b)
The case where fir.select only has a "unit" block target (i.e., it is a switch with only the default case) was not handled correctly in codegen.
…lvm#178937) When a target region is placed inside a constant false condition (e.g., `if (.false.)`), the dead code gets eliminated on the host side, removing the `omp.target` operation entirely. However, the device-side compilation pipeline is unaware of this elimination and attempts to generate kernel code. Since the host never created offload metadata for the eliminated target, the device-side kernel function lacks the "kernel" attribute, causing `OpenMPOpt` to fail with an assertion when it expects all outlined kernels to have this attribute. The problem can be seen with the following code: ```fortran program cele implicit none real :: V integer :: i if (.false.) then !$omp target teams distribute parallel do do i = 1, 5 V = V * 2 end do !$omp end target teams distribute parallel do end if end program ``` It currently fails with the following assertion: ``` Assertion `omp::isOpenMPKernel(*Kernel) && "Expected kernel function!"' failed. llvm/lib/Transforms/IPO/OpenMPOpt.cpp:4291 ``` This PR adds `DeleteUnreachableTargetsPass` that identifies `omp.target` operations in unreachable code blocks and removes them.
Directly unroll VectorEndPointerRecipe following 0636225 ([VPlan] Directly unroll VectorPointerRecipe, llvm#168886). It allows us to leverage existing VPlan simplifications to optimize. Co-authored-by: Luke Lau <luke@igalia.com> Co-authored-by: Florian Hahn <flo@fhahn.com>
`opt -passes=polly-custom<detect>`, or `stopafter=detect` would still run the ScopInfo analysis even though it should run when explicitly enabled or required by another phase.
…181621) The test fails on targets that have a different LLVM IR lowering (e.g. RISC-V which produces `signext i32` for the return type). Rather than complicate the test with more complex patterns, just set the triple explicitly to x86-64 (as various other generic clang/test/CodeGen* tests do). Test was introduced by llvm#163666. This fixes RISC-V CI.
Add a ConstantExpr::getPtrAdd() API that creates a getelementptr i8 constant expression, similar to IRBuilder::CreatePtrAdd(). In the future this will create a ptradd expression.
std::equal(std::byte) currently has sub-optimal codegen due to enum types not being recognized as trivially equality comparable. In order to fix this we make them trivially comparable. In the process I factored out into a standalone function EqualityComparisonIsDefaulted and refactored the test cases. Enum types cannot have operator== which is a hidden friend. Fixes llvm#132672
…when floating point types are involved (llvm#181208) The backend was adding fp-rounding mode flags to `uchar convert_uchar_rte(uint)`. These builtins are equivalent to `uchar convert_uchar(uint)` which simply truncates its input, since there is no floating-point value involved. Related to llvm#180936 This is consistent with what was implemented in the translator in KhronosGroup/SPIRV-LLVM-Translator#3120 and KhronosGroup/SPIRV-LLVM-Translator#3128
This commit introduces an "adaptive delay" feature to the ThreadSanitizer runtime to improve race detection by perturbing thread schedules. At various synchronization points (atomic operations, mutexes, and thread lifecycle events), the runtime may inject small delays (spin loops, yields, or sleeps) to explore different thread interleavings and expose data races that would otherwise occur only in rare execution orders. This change is inspired by prior work, which is discussed in more detail on https://discourse.llvm.org/t/rfc-tsan-implementing-a-fuzz-scheduler-for-tsan/80969. In short, https://reviews.llvm.org/D65383 was an earlier unmerged attempt at adding a random delays. Feedback on the RFC led to the version in this commit, aiming to limit the amount of delay. The adaptive delay feature uses a configurable time budget and tiered sampling strategy to balance race exposure against performance impact. It prioritizes high-value synchronization points with clear happens-before relationships: relaxed atomics receive lightweight spin delays with low sampling, synchronizing atomics (acquire / release / seq_cst) receive moderate delays with higher sampling, and mutex and thread lifecycle operations receive the longest delays with highest sampling. The feature is disabled by default and incurs minimal overhead when not enabled. Nearly all checks are guarded by an inline check on a global variable that is only set when enable_adaptive_delay=1. Microbenchmarks with tight loops of atomic operations showed no meaningful performance difference between an unmodified TSAN runtime and this version when running with empty TSAN_OPTIONS. An LLM assisted in writing portions of the adaptive delay logic, including the TimeBudget class, tiering concept, address sampler, and per-thread quota system. I reviewed the output and made amendments to reduce duplication and simplify the behavior. I also replaced the LLM's original double-based calculation logic with the integer-based Percent class. The LLM also helped write unit test cases for Percent. cc @dvyukov ## Examples I used the delay scheduler to find novel bugs that rarely or never occurred with the unmodified TSAN runtime. Some of the bugs below were found with earlier versions of the delay scheduler that I iterated on, but with this most recent implementation in this PR, I can still find the bugs far more reliably than with the standard TSAN runtime. - A use-after-free in the [BlazingMQ](https://github.com/bloomberg/blazingmq) broker during ungraceful producer disconnect. - Race in stdexec: NVIDIA/stdexec#1395 - Race in stdexec's MPSC queue: NVIDIA/stdexec#1812 - A few races in [BDE](https://github.com/bloomberg/bde) thread enabled data structures/algorithms. - The "Data race on variable a" test from https://ceur-ws.org/Vol-2344/paper9.pdf is more reliably reproduced with more aggressive adaptive scheduler options # Outstanding work - The [RFC](https://discourse.llvm.org/t/rfc-tsan-implementing-a-fuzz-scheduler-for-tsan/80969) suggests moving the scheduler to sanitizer_common, so that ASAN can leverage this. This should be done (should it be done in this PR?). - Missing interceptors for libdispatch
…lvm#181516) Currently it errors out due to FPRoundingMode misplacement.
DWARF to YAML optimizations: Add a lot of vector reserves & moves. See also WebAssembly/binaryen#8257 --------- Co-authored-by: stevenwdv <stevenwdv@users.noreply.github.com>
…egnerate (llvm#181631) The file had missing checks due to collisions and a lot of redundancy between x86/x64 and isa levels
These are no-ops.
These are no-ops.
This is a no-op.
…ode. NFC. (llvm#181637) isKnownToBeAPowerOfTwo has gotten to a size now that we should use a general switch like other value tracking helpers.
…n in tests (llvm#181638) This will fold to AND(NOT(X),1) in an upcoming fold, defeating the purpose of the repeated constant tests
As a followup to llvm#181365, this adds the `getInBoundsPtrAdd()` variant and updates code to use it.
…vm#180981) In OpenMP a canonical loop nest may be enclosed in a BLOCK construct. Specifically, the two loops below are considered to form a valid loop sequence: ```f90 do i = 1, n end do block do j = 1, m end do end block ``` Implement an extension to parser::Block::iterator that will treat the example above as ```f90 do i = 1, n end do do j = 1, m end do ``` that is, as if the BLOCK/ENDBLOCK statement were deleted. This will make the analysis of loop nests easier, since any such code will not have to deal with BLOCK constructs itself.
…vm#179122) `DenseElementsAttr` supports only a hard-coded list of element types: `int`, `index`, `float`, `complex`. This commit generalizes the `DenseElementsAttr` infrastructure: it now supports arbitrary element types, as long as they implement the new `DenseElementTypeInterface`. The `DenseElementTypeInterface` has the following helper functions: - `getDenseElementBitSize`: Query the size of an element in bits. (When storing an element in memory, each element is padded to a full byte. This is an existing limitation of the `DenseElementsAttr`; with an exception for `i1`.) - `convertToAttribute`: Attribute factory / deserializer. Converts bytes into an MLIR attribute. The attribute provides the assembly format / printer for a single element. - `convertFromAttribute`: Serializer. Converts an MLIR attribute into bytes. Note: `convertToAttribute` / `convertFromAttribute` are mainly for writing test cases. For performance reasons, `DenseElementsAttr` users should work with raw bytes / elements and avoid any API that materializes MLIR attributes. However, MLIR attributes typically have human-readable parsers/printers, making them suitable for lit tests and debugging. This PR introduces an additional assembly format for `DenseElementsAttrs`. There are now two formats. (The existing one is kept for compatibility reasons.) - Literal-first (existing): `dense<[1, 2, 3]> : tensor<3xi32>` - Type-first (new): `dense<tensor<3xi32> : [1 : i32, 2 : i32, 3 : i32]>` The new syntax is needed to disambiguate between "literal" (e.g., `1`) and attribute (e.g., `1 : i32`) when parsing the first token. In the literal-first syntax, we only parse literals. In the type-first syntax, we only parse attributes. The existing `int`, `index`, `float`, `complex` types also implement the `DenseElementTypeInterface`. This allows us to implement `DenseElementsAttr::get` and `AttributeElementIterator::operator*` in a generic way. RFC: https://discourse.llvm.org/t/rfc-allow-custom-element-types-in-denseelementattr/89656
…lvm#181609) MemCopyOptimizer was merging a store instruction into a memset of an `undef` value and emitted a large memset for the memset and store combined. This PR prevents MemCopyOptimizer from merging a store instruction into a memset of an `undef` value since it can be removed by subsequent cleanup passes. Helps rust-lang/rust#152541
llvm#181655) The instrument-ind-call test checks the correctness of instrumented snippet by the set of registers are used, the call id value is meaningless (platform depend) and should be exclude from test.
Suppress ADL on Blocks runtime calls in std::function.
This is a reland of llvm#167550. Instead of relying on libcpp for testing, we emulate our own hidden frames. This was originally causing tests failures on Windows.
Replace them with `SPIRVTypeInst`.
…ge to spirv-link (llvm#181870) Without this flag a lot of tests error in the linker. --------- Signed-off-by: Nick Sarnie <nick.sarnie@intel.com>
This PR adds WavePrefixProduct intrinsic support in HLSL with codegen for both DirectX and SPIRV backends. Resolves llvm#99173. - [x] Implement `WavePrefixProduct` clang builtin - [x] Link `WavePrefixProduct` clang builtin with `hlsl_intrinsics.h` - [x] Add sema checks for `WavePrefixProduct` to `CheckHLSLBuiltinFunctionCall` in `SemaChecking.cpp` - [x] Add codegen for `WavePrefixProduct` to `EmitHLSLBuiltinExpr` in `CGBuiltin.cpp` - [x] Add codegen tests to `clang/test/CodeGenHLSL/builtins/WavePrefixProduct.hlsl` - [x] Add sema tests to `clang/test/SemaHLSL/BuiltIns/WavePrefixProduct-errors.hlsl` - [x] Create the `int_dx_WavePrefixProduct` intrinsic in `IntrinsicsDirectX.td` - [x] Create the `DXILOpMapping` of `int_dx_WavePrefixProduct` to `121` in `DXIL.td` - [x] Create the `WavePrefixProduct.ll` and `WavePrefixProduct_errors.ll` tests in `llvm/test/CodeGen/DirectX/` - [x] Create the `int_spv_WavePrefixProduct` intrinsic in `IntrinsicsSPIRV.td` - [x] In `SPIRVInstructionSelector.cpp` create the `WavePrefixProduct` lowering and map it to `int_spv_WavePrefixProduct` in `SPIRVInstructionSelector::selectIntrinsic`. - [x] Create SPIR-V backend test case in `llvm/test/CodeGen/SPIRV/hlsl-intrinsics/WavePrefixProduct.ll`
…#181926) Reverts llvm#181367 This is causing crashes in tests on a Windows bot https://lab.llvm.org/buildbot/#/builders/46/builds/30854.
Reverts llvm#181261 Breaking builds on linux, reverting while I investigate. See https://lab.llvm.org/buildbot/#/builders/181/builds/37346
…baremetal" (llvm#181931) Reverts llvm#175530 This PR breaks libc header generation on Windows stage2 builds for a to be determined reason.
…llvm#181731 (llvm#181914) This test creates an invalid vector cost, but llvm#181731 allows transforming of lshr 0, 0 -> add 0, 0 which in turn allows costing of the following TreeEntry since is will be considered as 4 `ADD` operations. ``` %lshr.1 = lshr i96 0, 0 %lshr.2 = lshr i96 0, 0 %add.0 = add i96 0, 0 %add.1 = add i96 0, 0 ``` This commit adjusts the operands to ensure an invalid cost is still generated after llvm#181731. This test was originally added in 4652ec0.
* Add job responsible for bisecting commits on the CI environment
When creating loops to lower some AMX intrinsics, it is often the case we have enough information to synthesize profile metadata for the latch. This patch makes it so that we either set branch weights if everything is a known constant, or set unknown weights if we do not have constants. Reviewers: jdenny-ornl, mtrofin, phoebewang, KanRobert, RKSimon Pull Request: llvm#181578
…#181869) I noticed this was lacking while reviewing llvm#175800
… C++20 (llvm#181928) After 5d5301d, lldb fails to compile under C++20 with: ``` llvm-project/lldb/source/Target/StackFrameList.cpp:975:12: error: no viable conversion from returned value of type 'const char8_t *' to function return type 'std::string' (aka 'basic_string<char>') 975 | return show_unicode_marker ? u8" * " : u8"* "; | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ``` This PR reinterpret casts the unicode characters returned by `GetFrameMarker` to `const char*`.
…1547) This is an NFC change to make room for a more generalized "prepare" pass for inline assembly beyond CallBrInsts. In particular, changing how we generate code for inline assembly with "rm" constraints.
…ted_forall_to_threads` (llvm#170282) Harden `transform.gpu.map_nested_forall_to_threads` to reject non-positive block/grid sizes and handle zero-iteration dimensions gracefully, preventing assertion failures in `computeProduct`. Fix `getConstantIntValues` to return `std::nullopt` if any element is non-constant, avoiding invalid zero placeholders. Fixes: llvm#73562
…ace(2/3/5)` in AMDGPU tests (llvm#181710)
…81892) - Rename `getTiedOperandIdx` to `GetTiedOperandIdx` per LLVM CS. - Do not compute tied operand index for defs, since tied operands are printed only on uses. - Restructure the `if` in the later operand printing loop to not compute tied operand index/type for subreg-index imm operands.
To facilitate codegen decisions, we need to create an operation that can abstract the final update of the original and partial sum from a reduction. This is represented within the combiner recipes. Having an operator allows future lowering to clearly identify how to handle the final accumulation. This is currently an NFC. The format of this operation is: ``` acc.reduction_combine %srcMemref into %destMemref <reductionOperator> : type ```
…m#179121) This adds support for FP environment descriptions and RAII options for FP operations, i.e.,`CIRGenFPOptionsRAII`).
This adds a verifier to enforce the requirement that every catch handler in a cir.try operation must begin with a cir.catch_param operation.
After 8d971c0, there is a linked list container object called MatcherList. We no long hold a pointer directly to the first Matcher in the list. Rename the variables to make this clearer.
Check that the error location points to the destination operand. I'm planning to rewrite the code that generates that error, and I want to make sure I get the location right.
This revert llvm#181334 and its follow-up PRs (including llvm#181488, llvm#181492, llvm#181493, llvm#181494 and llvm#181498) as well as Ismail's documentation changes (llvm#181594, llvm#181717). The original commit causes a test failure in CI (llvm#181938) but the more I look at the patch, the more I'm convinced it was not ready to land. It will be easier to iterate on the feedback by re-landing this than by using post-commit review.
Open
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Change-Id: I444f175dad034cb4350dc590dce158fc404b654b
Stack:
main(Note: Closed and merged PRs may not be reflected here and PR numbering is not stable.)