Atomic attempts 3 #306

leios · 2022-06-13T16:46:56Z

This re-implements the atomic_... functions from #282 so I can keep using them if I need for testing down the road. Also: I can't quite get atomix to work for some of my use-cases, so I still need the "primitives". The Atomix tests have been moved to #308

leios · 2022-06-13T20:12:50Z

I split off the atomix tests because they are not passing yet. This PR is currently passing all the tests locally and most of the tests here (I don't know the root of the buildkite errors just yet). There are a few users who have specifically requested Core.Intrinsics / CUDA atomic functionality and this PR serves that role.

More than that, it's currently working. I feel it's worth merging now even if the core functionality is replaced by Atomix in the end.

leios · 2022-06-14T06:19:48Z

For those that are struggling with Atomix in KA, you can use this PR with:

] add https://github.com/leios/KernelAbstractions.jl#atomic_attempts_3
] add https://github.com/leios/KernelAbstractions.jl:lib/CUDAKernels#atomic_attempts_3

The first line adds KA, the second line adds CUDAKernels.

Also: examples of how to use each function can be found in the tests (https://github.com/leios/KernelAbstractions.jl/blob/atomic_attempts_3/test/atomic_test.jl). There are doc strings as well, but they are not built anywhere and essentially reflect the CUDA doc strings

lib/CUDAKernels/src/CUDAKernels.jl

src/cpu.jl

tkf · 2022-06-14T11:16:06Z

Just FYI, I'm almost sure Atomix + AtomixCUDA.jl, instead of Atomix + UnsafeAtomicsLLVM.jl, already covers what this PR does. I designed Atomix.jl to provide various dispatch point and make it flexible enough to cover this. Just load AtomixCUDA.jl before using KA + Atomix.jl + CUDA.jl.

As mentioned in JuliaConcurrent/Atomix.jl#33, the UnsafeAtomicsLLVM.jl approach needs some @device_override in CUDA.jl to workaround some LLVM codegen limitations. I suggested Atomix + UnsafeAtomicsLLVM.jl since it covers histogram-like use case and I thought that's what people need at the moment. If you want to invoke other atomic operations but prefer avoiding a PR for CUDA.jl, I feel it would be cleaner to resurrect AtomixCUDA.jl and use Atomix as the default API. This way, we can gradually drop "behind the scene hacks" like AtomixCUDA.jl and UnsafeAtomicsLLVM.jl while minimizing effects on user code.

leios · 2022-06-14T11:50:18Z

I have no problem with Atomix being the default API in the long run (though I have a somewhat strong preference to just using atomic_add(...) instead of using a macro for the same thing). The truth is that there are a few edge-cases I want to test with Atomix before I feel comfortable using it, which is why I wanted to package it as a separate PR in #282 and said #308 was not ready to be merged.

My main issue with Atomix is that it fails on somewhat basic tasks right now, like adding floats on the CPU or subtracting numbers on the GPU (#308). I get that there are ways around this, which is why I separated out #308. The hope is that we can figure out how to get the tests to pass so we can actually have proper atomic functionality in KA.

Also: I squashed this PR so it's a single commit. Once we have Atomix actually working, we can just revert this one at that time. I understand that in the long-run we shouldn't have multiple atomic apis, but Julia currently has a bunch of different apis, so I don't see the problem in giving people some choice while Atomix is becoming stable.

tkf · 2022-06-14T15:11:36Z

though I have a somewhat strong preference to just using atomic_add(...) instead of using a macro for the same thing

I totally agree that there should always be non-macro-base a interface. This is why Atomix.jl has non-macro interfaces https://juliaconcurrent.github.io/Atomix.jl/dev/interface/. UnsafeAtomics.jl can also potentially become non-macro "APIs" that cover what this PR does but more (i.e., it can specify ordering) although it's not documented (and hence not technically public interface).

like adding floats on the CPU

As of JuliaConcurrent/UnsafeAtomics.jl#9 more operations support floats

subtracting numbers on the GPU

AtomixCUDA.jl should be able to cover this since it simply dispatches to CUDA.atomic_* like this PR does.

leios marked this pull request as ready for review June 13, 2022 19:37

leios mentioned this pull request Jun 13, 2022

adding atomix tests #308

Draft

7 tasks

leios force-pushed the atomic_attempts_3 branch from 48646a3 to 3626b03 Compare June 13, 2022 20:11

leios mentioned this pull request Jun 14, 2022

adding atomic support with atomix #299

Merged

tkf reviewed Jun 14, 2022

View reviewed changes

lib/CUDAKernels/src/CUDAKernels.jl Show resolved Hide resolved

src/cpu.jl Show resolved Hide resolved

adding atomics via CUDA and Core.Intrinsics

c5db6f6

leios force-pushed the atomic_attempts_3 branch from bae9a95 to c5db6f6 Compare June 14, 2022 11:54

leios closed this Sep 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Atomic attempts 3 #306

Atomic attempts 3 #306

leios commented Jun 13, 2022 •

edited

Loading

leios commented Jun 13, 2022

leios commented Jun 14, 2022 •

edited

Loading

tkf commented Jun 14, 2022

leios commented Jun 14, 2022

tkf commented Jun 14, 2022

Atomic attempts 3 #306

Atomic attempts 3 #306

Conversation

leios commented Jun 13, 2022 • edited Loading

leios commented Jun 13, 2022

leios commented Jun 14, 2022 • edited Loading

tkf commented Jun 14, 2022

leios commented Jun 14, 2022

tkf commented Jun 14, 2022

leios commented Jun 13, 2022 •

edited

Loading

leios commented Jun 14, 2022 •

edited

Loading