LLM hallucinations

The Picard Maneuver@lemmy.world · 4 months ago

LLM hallucinations

kromem@lemmy.world · edit-2 4 months ago

So I’m guessing you haven’t seen Anthropic’s newest interpretability research where when they went in assuming that was how it worked.

But it turned out that they can actually plan beyond the immediate next token in things like rhyming verse where the network has already selected the final word of the following line and the intermediate tokens are generated with that planned target in mind.

So no, they predict beyond the next token and we only just developed sensitive enough measurement to detect that occurring an order of magnitude of tokens beyond just ‘next’. We’ll see if further research in that direction picks up planning beyond that even.

https://transformer-circuits.pub/2025/attribution-graphs/biology.html

NιƙƙιDιɱҽʂ@lemmy.world · edit-2 3 months ago

Right, other words see higher attention as it builds a sentence, leading it towards where it “wants” to go, but LLMs literally take a series of words, then spit out then next one. There’s a lot more going on under the hood as you said, but fundamentally that is the algorithm. Repeat that over and over, and you get a sentence.

If it’s writing a poem about flowers and ends the first part on “As the wind blows,” sure as shit “rose” is going to have significant attention within the model, even if that isn’t the immediate next word, as well as words that are strongly associated with it to build the bridge.