[Diff since v0.8.6](https://github.com/JuliaGPU/KernelAbstractions.jl/compare/v0.8.6...v0.9.0) **Closed issues:** - No speedup on CPU (#322) - Add Metal support (#326) **Merged pull requests:** - Start removing event system (#317) (@vchuravy) - Add Metal support (#337) (@tgymnich) - Prefer blocks over threads (#341) (@vchuravy) - ROCKernels: Add occupancy API (#342) (@pxl-th) - [CUDAKernels] add always_inline as device parameter (#343) (@vchuravy) - [CUDAKernels] Update compat (#345) (@vchuravy) - Update CI (#346) (@vchuravy) - ROCKernels: Adapt to AMDGPU changes (#348) (@jpsamaroo) - [ROCKernels] Fix addrspacecast (#349) (@vchuravy) - [ROCKernels] Import LLVM (#352) (@pxl-th) - Update compat for oneAPIKernels.jl (#355) (@utkarsh530) - Bump oneAPI to 1.0 (#356) (@michel2323) - Rename device to backend (#359) (@vchuravy) - Let Event(MtlDevice) actually be a barrier (#360) (@vchuravy) - Fix Metal workgroup size (#361) (@tgymnich) - Update docs (#362) (@vchuravy) - Add optional priority feature (#363) (@vchuravy) - Backends are adaptors (#364) (@vchuravy) - Only skip histogram tests on CPU (#365) (@vchuravy)