r/LocalLLaMA
· Communities
Qwen3.6 27B more dumb in vLLM compared to llama.cpp
Hello, I recently bought a new RTX 5060Ti to pair with the RTX 5060Ti I already own, now I have 32GB of VRAM. Up until now for convenience I've used llama.cpp, for goodness' sake it works excellently when only 1 user is using it, but now there are 2 of us using it and llama.cpp can't keep up, often user 1's cache gets