Skip to content
r/LocalLLaMA · Communities

Combined RTX5080 & 4060 for inference ?

Hey, I currently use my RTX 4060 8G for inference with Qwen 3.6-35B-A3B Q8 (q8 for everything weight,value,key) (for quality over speed, with CPU &DDR4 offloading) but : I only get ~100pp & 20tg at max when context is still low on Qwen 3.6-35B-A3B Q8, so I'd like to increase this speed. (weights Q4 only gave me ~30 tg