Atomic attempts
Created by: leios
This is a draft of an atomic update to Kernelabstractions.
I plan to put everything we need in the atomics.jl file (and corresponding CUDAKernels file); however, I cannot really test ROCM, so I might need to leave that to someone else.
Current roadmap (to be worked on throughout the week):
-
Implement all atomic primitives (I am failing at remembering the actual work here, but these are things like add, sub, inc, etc...) such that we have parity with CUDA.atomic_*
calls. Docs here: https://cuda.juliagpu.org/stable/api/kernel/#Atomics. Right now, I am directly calling the CUDA atomics and then using atomics on pointers for the CPU. I think this will work for all primitives, but might be wrong. Info about CPU atomics: https://gist.github.com/vtjnash/11b0031f2e2a66c9c24d33e810b34ec0#new-intrinsics-for-ptrt -
Implement tests for each primitive, following CUDA: https://github.com/JuliaGPU/CUDA.jl/blob/master/test/device/intrinsics/atomics.jl -
Add docs. Note that some of the atomic primitives only work on certain types (my GPU cannot do Float64 atomic add, for example), so we need to mention this (and maybe provide a clearer error than CUDA?). -
Add example. Thinking simple shared mem histogram with atomics. -
Add @atomic
macro (along with docs, etc). This is on a separate point because I might not do it for this PR. On the GPU, I think we can pull the@atomic
macro directly, but no such feature exists on the CPU (so far as I am aware). We can look for inspiration for the CPU implementation from the CUDA@atomic
macro definition: https://github.com/JuliaGPU/CUDA.jl/blob/master/src/device/intrinsics/atomics.jl, but that is tagged as "experimental" for now.
I am actually currently struggling with the final point because for some reason the macro I created (KernelAbstractions.@atomic
) is only grabbing the first symbol of an expression and not the full expression. If everyone is happy enough with the atomic primitives, I might decide to leave the macro to future work (tm).
This is a step towards finalizing #7 (closed) and #276 (closed); however, I am not sure if it fixes them completely without the @atomic
macro.