Enzyme autodiff produces out-of-bounds error for some kernels.
Created by: jlk9
Running this code using the current versions of Enzyme, KA, and CUDA.jl:
using KernelAbstractions
using CUDA
using Enzyme
function advanceTimeLevels!(field; backend=CUDABackend())
nthreads = 64
kernel2d! = advance_2d_array(backend, nthreads)
kernel2d!(field, ndrange=size(field)[1])
end
@kernel function advance_2d_array(field)
j = @index(Global, Linear)
if j < 101
@inbounds field[j,1] = field[j,2]
end
@synchronize()
end
field = CUDA.CuArray(ones(100, 2))
d_field = Enzyme.make_zero(field)
autodiff(Enzyme.Reverse, advanceTimeLevels!, Duplicated(field, d_field))
@show field
@show d_field
produces this error (can add more of the stacktrace if needed):
ERROR: a BoundsError was thrown during kernel execution on thread (37, 1, 1) in block (2, 1, 1).
Out-of-bounds array access
Since there are 64 threads per block, the 37th entry of block 2 corresponds to global index 101 which is out-of-bounds for the array field
. But the kernel has a conditional statement to avoid accessing the array at any entry greater than 100 (its length). If we run the function advanceTimeLevels!
without autodiff
, no error occurs. If we run autodiff
with a block size that divides the array length, such as nthreads = 100
or nthreads = 50
, no error as well.
@wsmoses
@michel2323