Basically, it’s a calculator that can take letters, numbers, words, sentences, and so on as input.
And produce a mathematically “correct” sounding output, defined by language patterns in the training data.
This core concept is in most if not all “AI” models, not just LLMs, I think.
There’s a part of our brains called the salience network, that continually models and predicts our environment and directs our conscious attention to things it fails to predict. And there’s a certain optimum level of unpredictability that attracts our attention without overwhelming it. When we talk to each other, most of the actual information content is predictable, and the salience network filters it out—the unpredictable part is the actual conscious message.
LLMs basically recreate the salience network. They continually model and predict the content of the text stream—except instead of modeling someone else’s words so they can extract the unpredictable conscious message, they model their own words so they can keep predicting the next ones.
This creates an obvious issue: when our salience networks process the stream of words coming out of an LLM, it’s all predictable, so our salience networks tell us there’s no actual message. When AI designers discovered this, they added a feature called “temperature” that basically injects randomness into the generated text so our salience networks will get fooled into thinking there’s a conscious message.
This was a great read, Thanks!
I have a new rabbit hole to explore 😝
This was a great read! I did have a feeling LLMs would be a but boring when setting them to low temperatures and now I understand why.
I have found that LLMs are great for predetermined processes, like generating JSON using a given format and writing code, but they suck at creative tasks, and your amazing explanation now told me why that is.
Thanks again!