-
v0.3.11f093b58 · ·
## KernelAbstractions v0.3.1 [Diff since v0.3.0](https://github.com/JuliaGPU/KernelAbstractions.jl/compare/v0.3.0...v0.3.1) **Closed issues:** - Matrix examples fails on Julia 1.4 (#43) - AssertionError from ndrange=() (#107) - Kernel compilation for OffsetArrays fails with `KernelError: recursion is currently not supported ` (#110) - Revise failure with KernelAbstractions (#111) **Merged pull requests:** - CompatHelper: add new compat entry for "LLVM" at version "1.5" (#105) (@github-actions[bot]) - add erf + erfc functions (#115) (@simonbyrne) - Shared memory transpose (#116) (@vchuravy) - add docs for localmem, private and uniform macros (#118) (@simonbyrne) - Typo (#120) (@PallHaraldsson)
-
v0.3.04ab11f29 · ·
## KernelAbstractions v0.3.0 [Diff since v0.2.6](https://github.com/JuliaGPU/KernelAbstractions.jl/compare/v0.2.6...v0.3.0) **Merged pull requests:** - Update KernelAbstractions to use CUDA 1.0 (#104) (@jakebolewski)
-
v0.2.6cf8ec8eb · ·
## KernelAbstractions v0.2.6 [Diff since v0.2.5](https://github.com/JuliaGPU/KernelAbstractions.jl/compare/v0.2.5...v0.2.6)
-
v0.2.588abd056 · ·
## KernelAbstractions v0.2.5 [Diff since v0.2.4](https://github.com/JuliaGPU/KernelAbstractions.jl/compare/v0.2.4...v0.2.5) **Merged pull requests:** - Add `SpecialFunctions.gamma` to kernel language (#99) (@lcw) - CompatHelper: add new compat entry for "SpecialFunctions" at version "0.10" (#100) (@github-actions[bot]) - Bump version to 0.2.5 (#101) (@simonbyrne)
-
v0.2.414172c48 · ·
## KernelAbstractions v0.2.4 [Diff since v0.2.3](https://github.com/JuliaGPU/KernelAbstractions.jl/compare/v0.2.3...v0.2.4) **Closed issues:** - LLVM error: Cannot select: 0xca6fd20: f64 = fpow 0x93a9e90 (#89) - Defining the same function twice? Maybe a typo? (#93) - private variable not available after an `@synchronize` (#95) - `@synchronize` in an if statement (#96) **Merged pull requests:** - fix function name check (#52) (@GiggleLiu) - Remove requires (#92) (@vchuravy) - remove redefinition of kernel (#94) (@vchuravy) - Allow typeof on @private memory (#97) (@jkozdon)
-
v0.2.31f3e7654 · ·
## KernelAbstractions v0.2.3 [Diff since v0.2.2](https://github.com/JuliaGPU/KernelAbstractions.jl/compare/v0.2.2...v0.2.3) **Merged pull requests:** - allow docstrings on kernels (#87) (@simonbyrne) - test on 1.4 (#88) (@vchuravy)
-
v0.2.226e06ae0 · ·
## KernelAbstractions v0.2.2 [Diff since v0.2.1](https://github.com/JuliaGPU/KernelAbstractions.jl/compare/v0.2.1...v0.2.2) **Merged pull requests:** - Don't busy wait on the CPU (#84) (@vchuravy) - Allow ndrange to be zero (#86) (@lcw)
-
v0.2.152acb3e6 · ·
## KernelAbstractions v0.2.1 [Diff since v0.2.0](https://github.com/JuliaGPU/KernelAbstractions.jl/compare/v0.2.0...v0.2.1) **Closed issues:** - Kernel launch overhead when launching lots of small kernels (#75) **Merged pull requests:** - CompatHelper: bump compat for "CUDAnative" to "3.0" (#77) (@github-actions[bot]) - Improve launch performance of kernels some more (#82) (@mwarusz)
-
v0.2.030bae114 · ·
## KernelAbstractions v0.2.0 [Diff since v0.1.6](https://github.com/JuliaGPU/KernelAbstractions.jl/compare/v0.1.6...v0.2.0) **Closed issues:** - Trailing return yields wrong CPU code. (#50) - Cannot resolve isbits when using function call to isbits (#79) **Merged pull requests:** - Error on return statements inside kernels (#74) (@mwarusz) - Forbid waiting in CUDA on a CPUEvents (#78) (@lcw) - Improve launch performance of kernels (#80) (@vchuravy) - Create MultiEvent from tuple of empty MultiEvents (#81) (@lcw)
-
v0.1.67d995792 · ·
## KernelAbstractions v0.1.6 [Diff since v0.1.5](https://github.com/JuliaGPU/KernelAbstractions.jl/compare/v0.1.5...v0.1.6) **Closed issues:** - Integrate `async_copy!` into the event system (#40) **Merged pull requests:** - Recurse into nested scopes containing synchronize (#70) (@mwarusz) - Implement better error propagation and semaphore (#72) (@vchuravy)
-
v0.1.574bf47e9 · ·
## KernelAbstractions v0.1.5 [Diff since v0.1.4](https://github.com/JuliaGPU/KernelAbstractions.jl/compare/v0.1.4...v0.1.5) **Merged pull requests:** - Run Event(f) on main thread (#71) (@vchuravy)
-
v0.1.48a95e72e · ·
## KernelAbstractions v0.1.4 [Diff since v0.1.3](https://github.com/JuliaGPU/KernelAbstractions.jl/compare/v0.1.3...v0.1.4) **Merged pull requests:** - Add CUDA rewrites for sincos(x) and exp(y) for complex y (#67) (@oschub) - Add `Event(f, args)` to integrate code using at_async better (#68) (@vchuravy) - async_copy! fixes (#69) (@lcw)
-
v0.1.311f39700 · ·
## KernelAbstractions v0.1.3 [Diff since v0.1.2](https://github.com/JuliaGPU/KernelAbstractions.jl/compare/v0.1.2...v0.1.3) **Merged pull requests:** - Allow MultiEvent to be created from an Event (#65) (@lcw)
-
v0.1.28cbc86e4 · ·
## KernelAbstractions v0.1.2 [Diff since v0.1.1](https://github.com/JuliaGPU/KernelAbstractions.jl/compare/v0.1.1...v0.1.2) **Merged pull requests:** - Unified printing (#61) (@leios) - add multievents (#62) (@vchuravy) - don't recuse into functions like Base.sin (#63) (@vchuravy) - make at_print work outside KA (#64) (@vchuravy)
-
v0.1.1c3b04289 · ·
## KernelAbstractions v0.1.1 [Diff since v0.1.0](https://github.com/JuliaGPU/KernelAbstractions.jl/compare/v0.1.0...v0.1.1) **Merged pull requests:** - Fix CUDA waiting on CUDA events (#59) (@lcw)
-
v0.1.091103f18 · ·
## KernelAbstractions v0.1.0 **Closed issues:** - Variable live-time counter intuitive on the CPU (#13) - Using Val as kernel argument triggers an assertion (#21) - Performance of naive transpose (#22) - Initialization error (#23) - `unroll` not defined inside a kernel (#24) - Document that private memory works differently than scratch in GPUifyLoops (#31) - How best sync with the default stream in the CUDA backed? (#46) **Merged pull requests:** - Bring up GPU functionality fully (#1) (@vchuravy) - Cleanup docs and remove ScalarCPU (#2) (@vchuravy) - CompatHelper: add new compat entry for "CUDAdrv" at version "5.1" (#4) (@github-actions[bot]) - CompatHelper: add new compat entry for "Requires" at version "1.0" (#5) (@github-actions[bot]) - add stream GC and wait with progress function (#10) (@vchuravy) - Adding a few more examples (#12) (@leios) - Fix and test local memory (#14) (@vchuravy) - implement Const memory for GPU and CPU (#16) (@vchuravy) - Handle type parameters in kernel functions (#25) (@vchuravy) - be less judicous with escape (#26) (@vchuravy) - dont't use nested inits (#27) (@vchuravy) - Blocked iteration (#28) (@vchuravy) - cleanup examples (#29) (@vchuravy) - add group index (#32) (@vchuravy) - Make kernels dispatchable (#33) (@mwarusz) - Use macrotools (#34) (@vchuravy) - add a block syntax for uniform (#35) (@vchuravy) - Fix private memory on the CPU (#36) (@mwarusz) - handle at_synchronize in blocks (#37) (@vchuravy) - add ntuple index type (#38) (@vchuravy) - fix nested unroll macros (#39) (@vchuravy) - Allow CPU and CUDA kernels to wait on each other (#41) (@lcw) - Fix tuple destructuring and bors+travis (#42) (@vchuravy) - Wait for GPU events using synchronize (#45) (@mwarusz) - [WIP] Infrastructure to sync CuDefaultStream() (#47) (@vchuravy) - Allow CPU kernels to depend on default events (#51) (@lcw) - Implement async_copy! (#53) (@vchuravy) - CompatHelper: bump compat for "CUDAapi" to "4.0" (#56) (@github-actions[bot]) - only create as many tasks as threads and more inference barriers (#57) (@vchuravy) - Ensure that constify doesn't cause arguments to be captured (#58) (@vchuravy)