CUDA 3.6.3 broke KernelAbstactions.
Created by: mcabbott
KernelAbstractions seems to be broken for me. The example here https://juliagpu.github.io/KernelAbstractions.jl/dev/examples/naive_transpose/ stalls, apparently forever, and Julia is broken after ^C
. I have pasted some details below. This is with the following versions:
(@v1.8) pkg> st
Status `~/.julia/environments/v1.8/Project.toml`
[052768ef] CUDA v3.6.4
[72cfdca4] CUDAKernels v0.3.2
[63c18a36] KernelAbstractions v0.7.2
I tried this because other calls gave different errors. There's another example here https://github.com/mcabbott/Tullio.jl/issues/134 and links to CI. The error there is
julia> triv1(cu(ones(3)))
ERROR: a exception was thrown during kernel execution.
Run Julia on debug level 2 for device stack traces.
Trying to solve that led me to try with julia -g2
, which pointed to getindex
, but nothing is out of bounds I don't think.
Some details from the transpose example, unsure how much is helpful to paste:
julia> event = naive_transpose!(a,b)
KernelAbstractions.CPUEvent(Task (runnable) @0x00007facea90f0f0)
julia> wait(event)
julia> @test a == transpose(b)
Test Passed
Expression: a == transpose(b)
Evaluated: Float32[88.0 29.0 … 60.0 10.0; 3.0 43.0 … 80.0 65.0; … ; 64.0 49.0 … 28.0 91.0; 78.0 62.0 … 16.0 79.0] == Float32[88.0 29.0 … 60.0 10.0; 3.0 43.0 … 80.0 65.0; … ; 64.0 49.0 … 28.0 91.0; 78.0 62.0 … 16.0 79.0]
julia> # beginning GPU tests
if has_cuda_gpu()
d_a = CuArray(a)
d_b = CUDA.zeros(Float32, res, res)
ev = naive_transpose!(d_a, d_b)
wait(ev)
a = Array(d_a)
b = Array(d_b)
@test a == transpose(b)
end
^CERROR: CUDA error: unspecified launch failure (code 719, ERROR_LAUNCH_FAILED)
Stacktrace:
[1] throw_api_error(res::CUDA.cudaError_enum)
@ CUDA ~/.julia/packages/CUDA/sCev8/lib/cudadrv/error.jl:91
[2] isdone
@ ~/.julia/packages/CUDA/sCev8/lib/cudadrv/stream.jl:107 [inlined]
[3] nonblocking_synchronize
@ ~/.julia/packages/CUDA/sCev8/lib/cudadrv/stream.jl:137 [inlined]
[4] nonblocking_synchronize
@ ~/.julia/packages/CUDA/sCev8/lib/cudadrv/context.jl:341 [inlined]
[5] device_synchronize()
@ CUDA ~/.julia/packages/CUDA/sCev8/lib/cudadrv/context.jl:335
[6] top-level scope
@ ~/.julia/packages/CUDA/sCev8/src/initialization.jl:54
caused by: InterruptException:
Stacktrace:
[1] try_yieldto(undo::typeof(Base.ensure_rescheduled))
@ Base ./task.jl:834
[2] wait()
@ Base ./task.jl:894
[3] wait(c::Base.GenericCondition{ReentrantLock})
@ Base ./condition.jl:124
[4] wait(e::Base.Event)
@ Base ./lock.jl:359
[5] wait(::CPU, ev::CUDAKernels.CudaEvent, progress::Function)
@ CUDAKernels ~/.julia/packages/CUDAKernels/ZhXxD/src/CUDAKernels.jl:87
[6] wait (repeats 2 times)
@ ~/.julia/packages/CUDAKernels/ZhXxD/src/CUDAKernels.jl:76 [inlined]
[7] top-level scope
@ REPL[18]:7
[8] top-level scope
@ ~/.julia/packages/CUDA/sCev8/src/initialization.jl:52
julia> using CUDA; @time cu(ones(3)) .+ 1 # anything afterwards fails
ERROR: CUDA error: unspecified launch failure (code 719, ERROR_LAUNCH_FAILED)
Stacktrace:
[1] throw_api_error(res::CUDA.cudaError_enum)
@ CUDA ~/.julia/packages/CUDA/sCev8/lib/cudadrv/error.jl:91
[2] isdone
@ ~/.julia/packages/CUDA/sCev8/lib/cudadrv/stream.jl:107 [inlined]
[3] nonblocking_synchronize
@ ~/.julia/packages/CUDA/sCev8/lib/cudadrv/stream.jl:137 [inlined]
[4] nonblocking_synchronize
@ ~/.julia/packages/CUDA/sCev8/lib/cudadrv/context.jl:341 [inlined]
[5] device_synchronize()
@ CUDA ~/.julia/packages/CUDA/sCev8/lib/cudadrv/context.jl:335
[6] top-level scope
@ ~/.julia/packages/CUDA/sCev8/src/initialization.jl:54
caused by: CUDA error: unspecified launch failure (code 719, ERROR_LAUNCH_FAILED)
Stacktrace:
[1] throw_api_error(res::CUDA.cudaError_enum)
@ CUDA ~/.julia/packages/CUDA/sCev8/lib/cudadrv/error.jl:91
[2] macro expansion
@ ~/.julia/packages/CUDA/sCev8/lib/cudadrv/error.jl:101 [inlined]
[3] cuMemAllocAsync(dptr::Base.RefValue{CuPtr{Nothing}}, bytesize::Int64, hStream::CuStream)
@ CUDA ~/.julia/packages/CUDA/sCev8/lib/utils/call.jl:26
[4] #alloc#1
@ ~/.julia/packages/CUDA/sCev8/lib/cudadrv/memory.jl:82 [inlined]
[5] macro expansion
@ ~/.julia/packages/CUDA/sCev8/src/pool.jl:41 [inlined]
[6] macro expansion
@ ./timing.jl:358 [inlined]
[7] actual_alloc(bytes::Int64; async::Bool, stream::CuStream)
@ CUDA ~/.julia/packages/CUDA/sCev8/src/pool.jl:39
[8] macro expansion
@ ~/.julia/packages/CUDA/sCev8/src/pool.jl:204 [inlined]
[9] macro expansion
@ ./timing.jl:358 [inlined]
[10] #_alloc#180
@ ~/.julia/packages/CUDA/sCev8/src/pool.jl:187 [inlined]
[11] #alloc#179
@ ~/.julia/packages/CUDA/sCev8/src/pool.jl:173 [inlined]
[12] alloc
@ ~/.julia/packages/CUDA/sCev8/src/pool.jl:169 [inlined]
[13] CuArray
@ ~/.julia/packages/CUDA/sCev8/src/array.jl:44 [inlined]
[14] CuArray
@ ~/.julia/packages/CUDA/sCev8/src/array.jl:287 [inlined]
[15] adapt_storage(#unused#::CUDA.CuArrayAdaptor{CUDA.Mem.DeviceBuffer}, xs::Vector{Float64})
@ CUDA ~/.julia/packages/CUDA/sCev8/src/array.jl:536
[16] adapt_structure
@ ~/.julia/packages/Adapt/wASZA/src/Adapt.jl:42 [inlined]
[17] adapt
@ ~/.julia/packages/Adapt/wASZA/src/Adapt.jl:40 [inlined]
[18] #cu#191
@ ~/.julia/packages/CUDA/sCev8/src/array.jl:546 [inlined]
[19] cu
@ ~/.julia/packages/CUDA/sCev8/src/array.jl:546 [inlined]
[20] top-level scope
@ ./timing.jl:241 [inlined]
[21] top-level scope
@ ./REPL[20]:0
[22] top-level scope
@ ~/.julia/packages/CUDA/sCev8/src/initialization.jl:52
julia> # error on quitting, ^D:
error in running finalizer: CUDA.CuError(code=CUDA.cudaError_enum(0x000002cf), meta=nothing)
throw_api_error at /home/mcabbott/.julia/packages/CUDA/sCev8/lib/cudadrv/error.jl:91
macro expansion at /home/mcabbott/.julia/packages/CUDA/sCev8/lib/cudadrv/error.jl:101 [inlined]
cuStreamDestroy_v2 at /home/mcabbott/.julia/packages/CUDA/sCev8/lib/utils/call.jl:26
macro expansion at /home/mcabbott/.julia/packages/CUDA/sCev8/lib/cudadrv/context.jl:184 [inlined]
unsafe_destroy! at /home/mcabbott/.julia/packages/CUDA/sCev8/lib/cudadrv/stream.jl:85
unknown function (ip: 0x7facdd263da2)
_jl_invoke at /home/mcabbott/.julia/dev/julia/src/gf.c:2304 [inlined]
ijl_apply_generic at /home/mcabbott/.julia/dev/julia/src/gf.c:2486
jl_apply at /home/mcabbott/.julia/dev/julia/src/julia.h:1777 [inlined]
run_finalizer at /home/mcabbott/.julia/dev/julia/src/gc.c:280
jl_gc_run_finalizers_in_list at /home/mcabbott/.julia/dev/julia/src/gc.c:367
run_finalizers at /home/mcabbott/.julia/dev/julia/src/gc.c:396 [inlined]
run_finalizers at /home/mcabbott/.julia/dev/julia/src/gc.c:374
ijl_atexit_hook at /home/mcabbott/.julia/dev/julia/src/init.c:236
jl_repl_entrypoint at /home/mcabbott/.julia/dev/julia/src/jlapi.c:707
main at /home/mcabbott/.julia/dev/julia/cli/loader_exe.c:59
__libc_start_main at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
_start at ./julia (unknown line)
...