Support atomics

Created by: vchuravy

Both Julia base and CUDAnative support atomics so ideally we should have a unified way of using and accessing them in KernelAbstractions so that we can implement codes like:

https://devblogs.nvidia.com/gpu-pro-tip-fast-histograms-using-shared-atomics-maxwell/