r/LocalLLaMA June 27, 2026 · Communities

Another big tensor fix b9820

sched : reintroduce less synchronizations during split compute (#20793) CUDA: Improve performance via less synchronizations between token (#17795) Adds CPU-to-CUDA copy capability to ggml_backend_cuda_cpy_tensor_async() Adds function to relax sync requirements between input copies on supported backends (CUDA for now) E

Read original