r/LocalLLaMA
· Communities
Tensor split performance on low-bandwidth (TB3) eGPUs, and a question
Hey everyone! I've got a pair of Morefine G1 4090M 16gb eGPUs connected at 40Gbps via TB3 (daisy-chained). I normally run them in layer split mode as it doesn't seem to need much bandwidth; I'm seeing around 1300t/s PP and 26t/s TG (35-40 with MTP), qwen3.6-27B @ Q4. Which is great. Started playing around with tensor m