Implement CPU->GPU events using `cuLaunchHostFunc`
Created by: vchuravy
Sadly this doesn't fix @lcw
original issue:
let kernel = happy(CUDA(), (1,))
# do not precompile, this hangs since cuModuleLoadDataEx will block
# if device is busy
barrier = Base.Threads.Event()
cpu_event = Event(wait, barrier)
gpu_event = kernel(;ndrange=(1,), dependencies=cpu_event)
notify(barrier)
wait(gpu_event)
end
we still hang in cuModuleLoadDataEx
.