Llama.cpp MTP Support merged - up to 2.5x speed increase

TheCornCollector@piefed.zip · edit-2 15 days ago

Found it by looking up dark mode Firewatch wallpapers.

Edit: Didn’t find higher resolutions of this specific one. But here are slightly different but higher resolutions variants: https://imgur.com/a/jvkoP

And the second one in this list

TheCornCollector@piefed.zip · 1 month ago

https://unsloth.ai/docs/models/qwen3.6#mtp-guide
Unsloth made a guide and has graphs with comparisons

TheCornCollector@piefed.zip · edit-2 1 month ago

Llama.cpp MTP Support merged - up to 2.5x speed increase

TheCornCollector@piefed.zip · 1 month ago

You can also contribute to OpenStreetMap in your area using simple apps like StreetComplete or EveryDoor. This has a way lower barrier to entry than contributing code in my opinion. And it has the immediate benefit of a better local map for a LOT of services that are built on top of OSM.

TheCornCollector@piefed.zip · 2 months ago

I’m really not fond of the profiling by automated means, but it seems like an inevitable consequence of the design of the threadiverse. Everything is public and easily accessible by anyone that would like to profile you.

I certainly disapprove of moderation based on ideology. Moderation should be based on quality of the content and if it fits in the publicly readable rules. Definitely not some hidden analytics or if the user completely fits in the in-group of the moderator.

I will admit that this might be a good way to find and filter out LLM based bots that are only there to promote or manipulate the conversation. But it should still be done according to public rules.

TheCornCollector@piefed.zip · 2 months ago

Is this post written by an LLM?

TheCornCollector@piefed.zip · 2 months ago

I’m no expert, but basically the way to unlock higher/full bandwidth for HDMI 2.1. This will allow the use of higher refresh rate, resolution, and bit depth + HDR. Right now you need to make sacrifices in at least one category with HDMI

TheCornCollector@piefed.zip · edit-2 2 months ago

What is the difference between this implementation and the reverse engineered patches that were published a few months ago by Michał Kopeć and Tomasz Pakuła?

Edit: apparently it’s not the same patch, but Tomasz was CC’ed in the patch set so the timing might not be accidental.

TheCornCollector@piefed.zip · 2 months ago

I’m European and had to do the same, so it’s based on something else.

TheCornCollector@piefed.zip · 2 months ago

Don’t know about Ubuntu specifically but for all software I actually want to work, I wait for the first point release upon a major release.

TheCornCollector@piefed.zip · 2 months ago

DeepSeek-V4 Pro (1.6T-A49) and Flash (284B-A13)

TheCornCollector@piefed.zip · 2 months ago

Artificial Analysis just posted their results and there seems to be a similar increase in output token usage as the 35B model.
Graph of Artificial Analysis benchmark token output shows 40% increase

TheCornCollector@piefed.zip · 2 months ago

Qwen3.6 27B released

TheCornCollector@piefed.zip · 2 months ago

Ah, I don’t know anything about Windows. I’m using Linux and both the latest ROCM (7.2.2) and latest vulkan (26.0.5) packages work without issues for combined gaming and AI. My reported numbers were with Vulkan at zero context for reference.

TheCornCollector@piefed.zip · 2 months ago

I’ve been using it for the past few days and the output quality seems to be on par or slightly better than 3.5 27b. The biggest issue is the token usage that has exploded with this revision. It can easily reason for 20k-25k tokens on a question where the qwen3.5 models used 10k. Since it runs more than 3 times faster, it still finished earlier than the 27b, but I won’t have any context/vram left to ask multiple questions.

Artificial Analysis has similar findings.

TheCornCollector@piefed.zip · 2 months ago

I agree with the suggestion of the other commenters, just wanted to add that I personally run llama.cpp directly with the build in llama-server. For a single-user server this seems to work great and is almost always at the forefront of model support.

TheCornCollector@piefed.zip · 2 months ago

I’m running it with the UD_Q4_K_XL quant on 24GB VRAM 7900XTX at ~85 token/s. Since it’s an MOE model, CPU inference with 32 GB ram should be doable, but I won’t make any promises on speed.

TheCornCollector@piefed.zip · 2 months ago

Qwen3.6-35B-A3B released

TheCornCollector@piefed.zip · 2 months ago

AllenAI has released open source models with open training data, code and science. If you value the ‘source’ to actually be open. They’ve also published the multimodal Molmo models.

TheCornCollector@piefed.zip · 3 months ago

Such a huge increase compared to previous months, with most of it coming from ‘64 bit’ and ‘0 64 bit’ seems suspicious. Don’t give me false hope…

TheCornCollector@piefed.zip · 4 months ago

Thanks, I added the checkout link to the OP

TheCornCollector@piefed.zip · edit-2 4 months ago

[Epic] Botany Manor, first-person puzzle game

TheCornCollector@piefed.zip · 5 months ago

I got some weird specialised hardware over USB working via WinBoat. Might be an option for some.

TheCornCollector@piefed.zip · 5 months ago

Joey Carbstrong 3 hour interview with Gary Yourofsky

TheCornCollector@piefed.zip · 5 months ago

[Steam] Kiki

TheCornCollector@piefed.zip · 5 months ago

Unfortunately, the AI community prefers rushed buggy development over proper, tested releases, so the quants and maybe the PR weren’t fully working.

As of 3 hours ago, unsloth was still updating their quants and guide. I don’t have time to test now but I wouldn’t judge the base model performance in the first few days when the bugs are still being worked out.

They also recommend some unconventional parameters in the Unsloth guide.

It could also be that the model is truly shit of course.

Edit I just took a look at the llama.cpp repo and there are still issues with the implementation as well.

TheCornCollector@piefed.zip · 5 months ago

Seems to be a new architecture so custom support is needed.

Tracking issue

PR

TheCornCollector@piefed.zip · 5 months ago

30B-A3B GLM-4.7-Flash Released

TheCornCollector