Move CUDA backend to CUDA.jl
Created by: vchuravy
In order to use KernelAbstractions to implement GPUArrays the GPU backends need to be implemented in CUDA.jl and ROCM.jl, otherwise we could get circular dependencies.
TODO:
- Make tests work properly