Skip to content
llama.cpp releases · Infrastructure

b9828

opencl: flash attention improvement (#25069) opencl: rework FA kernel for f16 and f32 opencl: flash-attention prefill prepass kernels flash_attn_kv_pad_f16 pads the tail KV tile to a BLOCK_N multiple flash_attn_mask_pad_f16 pads the matching mask tile flash_attn_blk_f16 classifies each KV tile per query block as fully