Don't skimp on the quant when using MoE

troed@fedia.io · 6 days ago

Don't skimp on the quant when using MoE

brockhold@lemmy.world · 4 days ago

This is exactly why I am so sad that I didn’t buy more DDR4 back when it was reasonable. I run Unsloth’s Qwen 3.5 122B A10B UD_Q4_k_XL and while it works great, I really wish I had enough ram for Q6 or even Q8. The speed difference won’t be wildly worse, but the quality of output is noticeable. I’m just glad that it works as well as it does in Q4. It’s mostly limited by my main ram bandwidth, the GPU helps but I only barely hit 15t/s decode with MTP hitting >80%.

troed@fedia.io · 4 days ago

15t/s is workable IMHO. What’s your system specs? I have 96GB DDR5 but never thought about going to an ever higher MoE.

Don't skimp on the quant when using MoE

Don't skimp on the quant when using MoE

Qwen3.6 - How to Run Locally | Unsloth Documentation