forked from llvm/llvm-project
-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[pull] main from llvm:main #5546
Open
pull
wants to merge
2,012
commits into
Ericsson:main
Choose a base branch
from
llvm:main
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…124111) Given the rest of the pass just gives up when it needs to compose subregisters, folding a subregister extract directly into a reg_sequence is counterproductive. Later fold attempts in the function will give up on the subregister operand, preventing looking up through the reg_sequence. It may still be profitable to do these folds if we start handling the composes. There are some test regressions, but this mostly looks better.
#124224) Set the starting index in the constructor instead of treating 0 as a special case. There should also be no need for bounds checking in the rewrite.
The verifier does not allow reg_sequence to have subregister defs, even if undef.
…" (#123945) This reverts commit 22561cf and fixes b7b9ccf (#112079). The problem is that x86_64 and Arm 32-bit have memory regions above the stack that are readable but not writeable. First Arm: ``` (lldb) memory region --all <...> [0x00000000fffcf000-0x00000000ffff0000) rw- [stack] [0x00000000ffff0000-0x00000000ffff1000) r-x [vectors] [0x00000000ffff1000-0xffffffffffffffff) --- ``` Then x86_64: ``` $ cat /proc/self/maps <...> 7ffdcd148000-7ffdcd16a000 rw-p 00000000 00:00 0 [stack] 7ffdcd193000-7ffdcd196000 r--p 00000000 00:00 0 [vvar] 7ffdcd196000-7ffdcd197000 r-xp 00000000 00:00 0 [vdso] ffffffffff600000-ffffffffff601000 --xp 00000000 00:00 0 [vsyscall] ``` Compare this to AArch64 where the test did pass: ``` $ cat /proc/self/maps <...> ffffb87dc000-ffffb87dd000 r--p 00000000 00:00 0 [vvar] ffffb87dd000-ffffb87de000 r-xp 00000000 00:00 0 [vdso] ffffb87de000-ffffb87e0000 r--p 0002a000 00:3c 76927217 /usr/lib/aarch64-linux-gnu/ld-linux-aarch64.so.1 ffffb87e0000-ffffb87e2000 rw-p 0002c000 00:3c 76927217 /usr/lib/aarch64-linux-gnu/ld-linux-aarch64.so.1 fffff4216000-fffff4237000 rw-p 00000000 00:00 0 [stack] ``` To solve this, look up the memory region of the stack pointer (using https://lldb.llvm.org/resources/lldbgdbremote.html#qmemoryregioninfo-addr) and constrain the read to within that region. Since we know the stack is all readable and writeable. I have also added skipIfRemote to the tests, since getting them working in that context is too complex to be worth it. Memory write failures now display the range they tried to write, and register write errors will show the name of the register where possible. The patch also includes a workaround for a an issue where the test code could mistake an `x` response that happens to begin with an `O` for an output packet (stdout). This workaround will not be necessary one we start using the [new implementation](https://discourse.llvm.org/t/rfc-fixing-incompatibilties-of-the-x-packet-w-r-t-gdb/84288) of the `x` packet. --------- Co-authored-by: Pavel Labath <[email protected]>
…125061) This is left over from the old way reductions were implemented. OpenMPVarMappingStackFrame doesn't actually do anything anymore so these uses can go away.
#124964) Check the canonical type in the matchers to handle aliases. For example std::optional uses add_pointer_t<...>.
…tor loads (#123081) getRegAllocationHints looks for ZPR2StridedOrContiguous load instructions which are used by FORM_TRANSPOSED_REG_TUPLE pseudos and adds all strided registers from this class to the list of hints. This patch changes getRegAllocationHints to restrict this list: - If the pseudo uses ZPRMul class, the first load must begin with a register which is a multiple of 2 or 4. - Only add a hint if it is part of a sequence of registers that do not already have any live intervals. This also contains changes to suggest hints when the load instructions and the FORM_TRANSPOSED pseudo use multi-vectors of different lengths, e.g. a pseudo with a 4-vector sequence of registers formed of one column extracted from four 2-vector loads.
The compilation was failing because `triple` is an `Xclang` flag. The failure was hidden by the XFAIL.
An argument graph node without uses forms a trivial SCC, which will already be handled by the preceding branch. If a node in the SCC points to a node with empty uses, then it will be part of a different SCC, and as such assumed to be capturing if it does not have an attribute. There is no need to handle them separately.
If it's not the callee operand, it must be a data operand.
… demanded (#124066) The motivation for this to allow reducing the vl when a user is a ternary pseudo, where the third operand is tied and also acts as a passthru. When checking the users of an instruction, we currently bail if the user is used as a passthru because all of its elements past vl will be used for the tail. We can allow passthru users if we know the tail of their result isn't used, which we will have computed beforehand after #124530 It's worth noting that this is all irrelevant of the tail policy, because tail agnostic still ends up using the passthru. I've checked that SPEC CPU 2017 + llvm-test-suite pass with this (on qemu with rvv_ta_all_1s=true) Fixes #123760
This patch inlines hlfir.reshape for simple cases, such as when there is no ORDER argument; and when PAD is present, only the trivial types are handled.
…lue (#125059) The code in `translateToExtendedValue(hlfir::Entity)` was not getting rid of the fir.box for scalars because isSimplyContiguous() returned false for them. This created issues downstream because utilities using fir::ExtendedValue were not implemented to work with intrinsic scalars fir.box. fir.box of intrinsic scalars are not very commonly used as hlfir::Entity but they are allowed and should work where accepted.
) This PR optimizes the performance of `std::ranges::copy` and `std::ranges::copy_n` specifically for `vector<bool>::iterator`, addressing a subtask outlined in issue #64038. The optimizations yield performance improvements of up to **2000x** for aligned copies and **60x** for unaligned copies. Additionally, new tests have been added to validate these enhancements. - Aligned source-destination bits ranges::copy ``` -------------------------------------------------------------------------- Benchmark Before After Improvement -------------------------------------------------------------------------- bm_ranges_copy_vb_aligned/8 10.8 ns 1.42 ns 8x bm_ranges_copy_vb_aligned/64 88.5 ns 2.28 ns 39x bm_ranges_copy_vb_aligned/512 709 ns 1.95 ns 364x bm_ranges_copy_vb_aligned/4096 5568 ns 5.01 ns 1111x bm_ranges_copy_vb_aligned/32768 44754 ns 38.7 ns 1156x bm_ranges_copy_vb_aligned/65536 91092 ns 73.2 ns 1244x bm_ranges_copy_vb_aligned/102400 139473 ns 127 ns 1098x bm_ranges_copy_vb_aligned/106496 189004 ns 81.5 ns 2319x bm_ranges_copy_vb_aligned/110592 153647 ns 71.1 ns 2161x bm_ranges_copy_vb_aligned/114688 159261 ns 70.2 ns 2269x bm_ranges_copy_vb_aligned/118784 181910 ns 73.5 ns 2475x bm_ranges_copy_vb_aligned/122880 174117 ns 76.5 ns 2276x bm_ranges_copy_vb_aligned/126976 176020 ns 82.0 ns 2147x bm_ranges_copy_vb_aligned/131072 180757 ns 137 ns 1319x bm_ranges_copy_vb_aligned/135168 190342 ns 158 ns 1205x bm_ranges_copy_vb_aligned/139264 192831 ns 103 ns 1872x bm_ranges_copy_vb_aligned/143360 199627 ns 89.4 ns 2233x bm_ranges_copy_vb_aligned/147456 203881 ns 88.6 ns 2301x bm_ranges_copy_vb_aligned/151552 213345 ns 88.4 ns 2413x bm_ranges_copy_vb_aligned/155648 216892 ns 92.9 ns 2335x bm_ranges_copy_vb_aligned/159744 222751 ns 96.4 ns 2311x bm_ranges_copy_vb_aligned/163840 225995 ns 173 ns 1306x bm_ranges_copy_vb_aligned/167936 235230 ns 202 ns 1165x bm_ranges_copy_vb_aligned/172032 244093 ns 131 ns 1863x bm_ranges_copy_vb_aligned/176128 244434 ns 111 ns 2202x bm_ranges_copy_vb_aligned/180224 249570 ns 108 ns 2311x bm_ranges_copy_vb_aligned/184320 254538 ns 108 ns 2357x bm_ranges_copy_vb_aligned/188416 261817 ns 113 ns 2317x bm_ranges_copy_vb_aligned/192512 269923 ns 125 ns 2159x bm_ranges_copy_vb_aligned/196608 273494 ns 210 ns 1302x bm_ranges_copy_vb_aligned/200704 280035 ns 269 ns 1041x bm_ranges_copy_vb_aligned/204800 293102 ns 231 ns 1269x ``` ranges::copy_n ``` -------------------------------------------------------------------------- Benchmark Before After Improvement -------------------------------------------------------------------------- bm_ranges_copy_n_vb_aligned/8 11.8 ns 0.89 ns 13x bm_ranges_copy_n_vb_aligned/64 91.6 ns 2.06 ns 44x bm_ranges_copy_n_vb_aligned/512 718 ns 2.45 ns 293x bm_ranges_copy_n_vb_aligned/4096 5750 ns 5.02 ns 1145x bm_ranges_copy_n_vb_aligned/32768 45824 ns 40.9 ns 1120x bm_ranges_copy_n_vb_aligned/65536 92267 ns 73.8 ns 1250x bm_ranges_copy_n_vb_aligned/102400 143267 ns 125 ns 1146x bm_ranges_copy_n_vb_aligned/106496 148625 ns 82.4 ns 1804x bm_ranges_copy_n_vb_aligned/110592 154817 ns 72.0 ns 2150x bm_ranges_copy_n_vb_aligned/114688 157953 ns 70.4 ns 2244x bm_ranges_copy_n_vb_aligned/118784 162374 ns 71.5 ns 2270x bm_ranges_copy_n_vb_aligned/122880 168638 ns 72.9 ns 2313x bm_ranges_copy_n_vb_aligned/126976 175596 ns 76.6 ns 2292x bm_ranges_copy_n_vb_aligned/131072 181164 ns 135 ns 1342x bm_ranges_copy_n_vb_aligned/135168 184697 ns 157 ns 1176x bm_ranges_copy_n_vb_aligned/139264 191395 ns 104 ns 1840x bm_ranges_copy_n_vb_aligned/143360 194954 ns 88.3 ns 2208x bm_ranges_copy_n_vb_aligned/147456 208917 ns 86.1 ns 2426x bm_ranges_copy_n_vb_aligned/151552 211101 ns 87.2 ns 2421x bm_ranges_copy_n_vb_aligned/155648 213175 ns 89.0 ns 2395x bm_ranges_copy_n_vb_aligned/159744 218988 ns 86.7 ns 2526x bm_ranges_copy_n_vb_aligned/163840 225263 ns 156 ns 1444x bm_ranges_copy_n_vb_aligned/167936 230725 ns 184 ns 1254x bm_ranges_copy_n_vb_aligned/172032 235795 ns 119 ns 1981x bm_ranges_copy_n_vb_aligned/176128 241145 ns 101 ns 2388x bm_ranges_copy_n_vb_aligned/180224 250680 ns 99.5 ns 2519x bm_ranges_copy_n_vb_aligned/184320 262954 ns 99.7 ns 2637x bm_ranges_copy_n_vb_aligned/188416 258584 ns 103 ns 2510x bm_ranges_copy_n_vb_aligned/192512 267190 ns 125 ns 2138x bm_ranges_copy_n_vb_aligned/196608 270821 ns 213 ns 1271x bm_ranges_copy_n_vb_aligned/200704 279532 ns 262 ns 1067x bm_ranges_copy_n_vb_aligned/204800 283412 ns 222 ns 1277x ``` - Unaligned source-destination bits ``` -------------------------------------------------------------------------------- Benchmark Before After Improvement -------------------------------------------------------------------------------- bm_ranges_copy_vb_unaligned/8 12.8 ns 8.59 ns 1.5x bm_ranges_copy_vb_unaligned/64 98.2 ns 8.24 ns 12x bm_ranges_copy_vb_unaligned/512 755 ns 18.1 ns 42x bm_ranges_copy_vb_unaligned/4096 6027 ns 102 ns 59x bm_ranges_copy_vb_unaligned/32768 47663 ns 774 ns 62x bm_ranges_copy_vb_unaligned/262144 378981 ns 6455 ns 59x bm_ranges_copy_vb_unaligned/1048576 1520486 ns 25942 ns 59x bm_ranges_copy_n_vb_unaligned/8 11.3 ns 8.22 ns 1.4x bm_ranges_copy_n_vb_unaligned/64 97.3 ns 7.89 ns 12x bm_ranges_copy_n_vb_unaligned/512 747 ns 18.1 ns 41x bm_ranges_copy_n_vb_unaligned/4096 5932 ns 99.0 ns 60x bm_ranges_copy_n_vb_unaligned/32768 47776 ns 749 ns 64x bm_ranges_copy_n_vb_unaligned/262144 378802 ns 6576 ns 58x bm_ranges_copy_n_vb_unaligned/1048576 1547234 ns 26229 ns 59x ```
…pport (#123149) As there is now certain areas where we now have the possibility of having either a ModuleOp or GPUModuleOp and both of these modules can have DataLayout's and we may require utilising the DataLayout utilities in these areas I've taken the liberty of trying to extend them for use with both. Those with more knowledge of how they wish the GPUModuleOp's to interact with their parent ModuleOp's DataLayout may have further alterations they wish to make in the future, but for the moment, it'll simply utilise the basic data layout construction which I believe combines parent and child datalayouts from the ModuleOp and GPUModuleOp. If there is no GPUModuleOp DataLayout it should default to the parent ModuleOp. It's worth noting there is some weirdness if you have two module operations defining builtin dialect DataLayout Entries, it appears the combinatorial functionality for DataLayouts doesn't support the merging of these. This behaviour is useful for areas like: https://github.com/llvm/llvm-project/pull/119585/files#diff-19fc4bcb38829d085e25d601d344bbd85bf7ef749ca359e348f4a7c750eae89dR1412 where we have a crossroads between the two different module operations.
…IR branching error (#123771) Currently if we generate code for the below target data map that uses an optional mapping: !$omp target data if(present(a)) map(alloc:a) do i = 1, 10 a(i) = i end do !$omp end target data We yield an LLVM-IR error as the branch for the else path is not generated. This occurs because we enter the NoDupPriv path of the call back function when generating the else branch, however, the emitBranch function needs to be set to a block for it to functionally generate and link in a follow up branch. The NoDupPriv path currently doesn't do this, while it's not supposed to generate anything (as far as I am aware) we still need to at least set the builders placement back so that it emits the appropriate follow up branch. This avoids the missing terminator LLVM-IR verification error by correctly generating the follow up branch.
Oversight found by ISel fuzz effort. Assuming the argument is a register, in some cases it can be an immediate. Tablegen's type for the instruction is SSrc_b32, i.e. register or immediate fine. Added the repro from the bug reporter as a test case - prior to this patch llvm will assert in getReg. Fixes SWDEV-508589
…123906)"" (#125091) Reverts #123945 Has failed on the Windows on Arm buildbot: https://lab.llvm.org/buildbot/#/builders/141/builds/5865 ``` ******************** Unresolved Tests (2): lldb-api :: functionalities/reverse-execution/TestReverseContinueBreakpoints.py lldb-api :: functionalities/reverse-execution/TestReverseContinueWatchpoints.py ******************** Failed Tests (1): lldb-api :: functionalities/reverse-execution/TestReverseContinueNotSupported.py ``` Reverting while I reproduce locally.
LLVM has two tablegen generators: one in llvm/tblgen.bzl (`gentbl`, macro-based) and one in mlir/tblgen.bzl (`gentbl_cc_library`, rule-based). The `gentbl_cc_library` generator in MLIR has some advantages to being a rule, and at any rate, it seems better to just use the same tablegen rule everywhere instead of competing implementations.
…24848) This adds a VP version of an existing DAG combine. I've put it in RISCVISelLowering since we would need to add a ISD::VP_AVGCEIL opcode otherwise. This pattern appears in 525.264_r.
…Transpose perms parameter (#124945) When consolidating transpose ops into one, use `tosa::ConstOp` for the permutations parameter instead of `arith::ConstantOp`.
This PR fixes the folder of a `vector.shuffle` with constant input vectors in the presence of a poison index. Partially poison vectors are currently not supported in UB so the folder select v1[0] for elements indexed by poison.
…125277) This enables -mcpu=native for the HiFive Premier P550 board.
Associate '-mlink-bitcode-file' as both CC1 and FC1 option. Fixes https://gitlab.e4s.io/uo-public/llvm-openmp-offloading-v2/-/jobs/360327
…ad results of `linalg.generic` op. (#125141) This functionality was wrapped within a pattern. Expose this as a separate transformations function that can be used outside of pattern rewrite mechanism. --------- Signed-off-by: MaheshRavishankar <[email protected]>
) Reverts #125253 It introduced an msan failure. Caught by a buildbot here: https://lab.llvm.org/buildbot/#/builders/164/builds/6922/steps/17/logs/stdio
Commit f10441a dropped a special case for isUndefWeak and --no-dynamic-linking but also made --export-dynamic ineffective for static PIE. This change restores the --export-dynamic behavior and entirely drops special handling of --no-dynamic-linker: * -pie with no input DSO, similar to --no-dynamic-linker, suppresses undefined symbols in .dynsym The new behaviors resemble GNU ld more.
…. NFC This was checking whether the erase is needed, but erase is safe to call with equal iterators.
This change updates the Float to TF32 conversion MLIR Op to include lowering to the new intrinsics introduced in sm_100 through ptx8.6: - `nvvm_f2tf32_rn_satfinite` - `nvvm_f2tf32_rn_relu_satfinite` - `nvvm_f2tf32_rz_satfinite` - `nvvm_f2tf32_rz_relu_satfinite` PTX Spec Reference: https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-cvt
It is needed by #117442.
) Consider the following pattern: ``` %cmp = fcmp <pred> double %x, 0.000000e+00 %negX = fneg <fmf> double %x %sel = select i1 %cmp, double %x, double %negX ``` We cannot propagate ninf from fneg to select since `%negX` may not be chosen. Similarly, we cannot propagate nnan unless `%negX` is guaranteed to be selected when `%x` is NaN. This patch also propagates nnan/ninf from fcmp to avoid regression in `PhaseOrdering/generate-fabs.ll`. Alive2: https://alive2.llvm.org/ce/z/t6U-tA Closes #121430 and #113989.
This patch fixes: llvm/lib/Analysis/ValueTracking.cpp:116:27: error: unused function 'safeCxtI' [-Werror,-Wunused-function]
This patch adds a default constructor to BlockFlags to initialize its members to false, placing initializers close to the member declarations. Note that once C++20 is available in our codebase, we can replace the explicit default constructor with: bool Reachable : 1 = true; :
…rsion. NFC The code that moves CheckOpcode before CheckType/CheckChildType/RecordDwith was running after ContractNodes started unwinding its recursion. If a move occurs we would start a new recursion going forward through the list again. I don't believe this can lead to any new combines so it was just wasted work. This patch moves the code earlier so it doesn't start a new recursion.
Forked from llvm/test/CodeGen/AArch64/arm64-vmovn.ll Unknown intrinsics which are currently incorrectly handled by visitInstruction: - llvm.aarch64.neon.sqxtn - llvm.aarch64.neon.sqxtun - llvm.aarch64.neon.uqxtn
… expression (#117437) Clang currently support extending lifetime of object bound to reference members of aggregates, that are created from default member initializer. This PR address this change and updaye CFG and ExprEngine. This PR reapply #91879. Fixes #93725. --------- Signed-off-by: yronglin <[email protected]>
DeclareImplicitDeductionGuidesForTypeAlias. This improves the code readability.
… the same base pointer (#121892) Alive2: https://alive2.llvm.org/ce/z/P5XbMx Closes #121890 TODO: It is still safe to perform this transform without nowrap flags if the corresponding scale factor is 1 byte: https://alive2.llvm.org/ce/z/J-JCJd
cc @tobiasgrosser @wsmoses this PR adds some new ops and types to the MLIR MPI dialect. the goal is to get the minimum required ops here to get a project of us working, and if everything works well, continue adding ops to the mpi dialect on subsequent PRs until we achieve some level of compliance with the MPI standard. --- Things left to do in subsequent PRs: - Add back the `mpi.comm` type and add as optional argument of current implemented ops that should support it (i.e. `send`, `recv`, `isend`, `irecv`, `allreduce`, `barrier`). - Support defining custom `MPI_Op`s (the MPI operations, not the tablegen `MPI_Op`) as regions. - Add more ops.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by pull[bot] (v2.0.0-alpha.1)
Can you help keep this open source service alive? 💖 Please sponsor : )