How best sync with the default stream in the CUDA backed?
Created by: lcw
When combining KA with CuArrays we have needed to sync with the default stream. The hack we have right now can be seen for example here https://github.com/climate-machine/CLIMA/pull/799/commits/417dd726d4340e7688f3810b981077dcdadf92bf.
Should a mechanism exist in KA to give us an event in the default stream to pass as a dependency to KA kernels?
If so what should the CPU backend do? We could return nothing
and make wait(::Nothing; progress=nothing) = nothing
.