Ethan Mollick writing in his newsletter, One Useful Thing:
But the LLM does not just produce one token, instead, after each token, it now looks at the entire original sentence plus the new token (“The best type of pet is a dog”) and predicts the next token after that, and then uses that whole sentence plus the next to make a prediction, and so on. It chains one token to another like cars on a train. Current LLMs can’t go back and change a token that came before, they have to soldier on, adding word after word. This results in a butterfly effect. If the first predicted token was the word “dog” than the rest of the sentence will follow on like that, if it is “subjective” then you will get an entirely different sentence. Any difference between the tokens in two different answers will result in radically diverging responses.
Entire post is an excellent read.
