Improve launch performance of kernels some more
Created by: mwarusz
Using this script https://github.com/JuliaGPU/KernelAbstractions.jl/pull/80#issuecomment-605210053
Before:
[ Info: Ka Launch
BenchmarkTools.Trial:
memory estimate: 816 bytes
allocs estimate: 9
--------------
minimum time: 4.301 μs (0.00% GC)
median time: 4.585 μs (0.00% GC)
mean time: 4.617 μs (0.46% GC)
maximum time: 220.415 μs (97.08% GC)
--------------
samples: 10000
evals/sample: 7
After:
BenchmarkTools.Trial:
memory estimate: 816 bytes
allocs estimate: 9
--------------
minimum time: 98.354 ns (0.00% GC)
median time: 121.497 ns (0.00% GC)
mean time: 148.127 ns (17.31% GC)
maximum time: 1.882 μs (87.36% GC)
--------------
samples: 9587
evals/sample: 951