Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Atomic attempts 3 #306

Closed
wants to merge 1 commit into from
Closed

Atomic attempts 3 #306

wants to merge 1 commit into from

Conversation

leios
Copy link
Contributor

@leios leios commented Jun 13, 2022

This re-implements the atomic_... functions from #282 so I can keep using them if I need for testing down the road. Also: I can't quite get atomix to work for some of my use-cases, so I still need the "primitives". The Atomix tests have been moved to #308

@leios leios marked this pull request as ready for review June 13, 2022 19:37
@leios leios mentioned this pull request Jun 13, 2022
7 tasks
@leios leios force-pushed the atomic_attempts_3 branch from 48646a3 to 3626b03 Compare June 13, 2022 20:11
@leios
Copy link
Contributor Author

leios commented Jun 13, 2022

I split off the atomix tests because they are not passing yet. This PR is currently passing all the tests locally and most of the tests here (I don't know the root of the buildkite errors just yet). There are a few users who have specifically requested Core.Intrinsics / CUDA atomic functionality and this PR serves that role.

More than that, it's currently working. I feel it's worth merging now even if the core functionality is replaced by Atomix in the end.

@leios
Copy link
Contributor Author

leios commented Jun 14, 2022

For those that are struggling with Atomix in KA, you can use this PR with:

] add https://github.com/leios/KernelAbstractions.jl#atomic_attempts_3
] add https://github.com/leios/KernelAbstractions.jl:lib/CUDAKernels#atomic_attempts_3

The first line adds KA, the second line adds CUDAKernels.

Also: examples of how to use each function can be found in the tests (https://github.com/leios/KernelAbstractions.jl/blob/atomic_attempts_3/test/atomic_test.jl). There are doc strings as well, but they are not built anywhere and essentially reflect the CUDA doc strings

lib/CUDAKernels/src/CUDAKernels.jl Show resolved Hide resolved
src/cpu.jl Show resolved Hide resolved
@tkf
Copy link
Contributor

tkf commented Jun 14, 2022

Just FYI, I'm almost sure Atomix + AtomixCUDA.jl, instead of Atomix + UnsafeAtomicsLLVM.jl, already covers what this PR does. I designed Atomix.jl to provide various dispatch point and make it flexible enough to cover this. Just load AtomixCUDA.jl before using KA + Atomix.jl + CUDA.jl.

As mentioned in JuliaConcurrent/Atomix.jl#33, the UnsafeAtomicsLLVM.jl approach needs some @device_override in CUDA.jl to workaround some LLVM codegen limitations. I suggested Atomix + UnsafeAtomicsLLVM.jl since it covers histogram-like use case and I thought that's what people need at the moment. If you want to invoke other atomic operations but prefer avoiding a PR for CUDA.jl, I feel it would be cleaner to resurrect AtomixCUDA.jl and use Atomix as the default API. This way, we can gradually drop "behind the scene hacks" like AtomixCUDA.jl and UnsafeAtomicsLLVM.jl while minimizing effects on user code.

@leios
Copy link
Contributor Author

leios commented Jun 14, 2022

I have no problem with Atomix being the default API in the long run (though I have a somewhat strong preference to just using atomic_add(...) instead of using a macro for the same thing). The truth is that there are a few edge-cases I want to test with Atomix before I feel comfortable using it, which is why I wanted to package it as a separate PR in #282 and said #308 was not ready to be merged.

My main issue with Atomix is that it fails on somewhat basic tasks right now, like adding floats on the CPU or subtracting numbers on the GPU (#308). I get that there are ways around this, which is why I separated out #308. The hope is that we can figure out how to get the tests to pass so we can actually have proper atomic functionality in KA.

Also: I squashed this PR so it's a single commit. Once we have Atomix actually working, we can just revert this one at that time. I understand that in the long-run we shouldn't have multiple atomic apis, but Julia currently has a bunch of different apis, so I don't see the problem in giving people some choice while Atomix is becoming stable.

@leios leios force-pushed the atomic_attempts_3 branch from bae9a95 to c5db6f6 Compare June 14, 2022 11:54
@tkf
Copy link
Contributor

tkf commented Jun 14, 2022

though I have a somewhat strong preference to just using atomic_add(...) instead of using a macro for the same thing

I totally agree that there should always be non-macro-base a interface. This is why Atomix.jl has non-macro interfaces https://juliaconcurrent.github.io/Atomix.jl/dev/interface/. UnsafeAtomics.jl can also potentially become non-macro "APIs" that cover what this PR does but more (i.e., it can specify ordering) although it's not documented (and hence not technically public interface).

like adding floats on the CPU

As of JuliaConcurrent/UnsafeAtomics.jl#9 more operations support floats

subtracting numbers on the GPU

AtomixCUDA.jl should be able to cover this since it simply dispatches to CUDA.atomic_* like this PR does.

@leios leios closed this Sep 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants