Implement better error propagation and semaphore
Created by: vchuravy
@lcw
observed that this delightful code deadlocks:
barrier = Base.Threads.Event()
cpu_event = Event(wait, barrier)
wait(CUDA(), cpu_event) # Event edge on CuDefaultStream
gpu_event = Event(CUDA()) # Event on CuDefaultStream
notify(barrier)
wait(gpu_event)
The reason was that I thought it would be okay wait(CUDA(), cpu_event) # Event edge on CuDefaultStream
to be a blocking event. not thinking about the fact that there could be dependency edges outside our control (here the barrier
) in Lucas original code MPI