Skip to content
llama.cpp releases · Infrastructure

b9820

sched : reintroduce less synchronizations during split compute (#20793) CUDA: Improve performance via less synchronizations between token (#17795) Adds CPU-to-CUDA copy capability to ggml_backend_cuda_cpy_tensor_async() Adds function to relax sync requirements between input copies on supported backends (CUDA for now) E