1. How Language Models Actually Work (Just Enough)
A useful mental model
A large language model (LLM) is a system trained on enormous amounts of text to predict what comes next. You give it some text, and it produces a continuation one piece at a time, each piece chosen based on everything it has seen so far. You do not need to understand the math to use one well. You do need a working intuition, because that intuition explains nearly every quirk you will encounter.
Tokens: the units the model reads and writes
Models do not see words or letters directly. They break text into tokens, which are common chunks of characters. A token might be a whole short word, a fragment of a longer word, a space plus a word, or a piece of punctuation. As a rough rule of thumb in English, one token is about four characters, and 100 tokens is roughly 75 words. This matters for two practical reasons: the model has a limit on how many tokens it can handle at once, and many services price usage by tokens. Knowing this helps you estimate whether a long document will fit and why very long requests cost more.
Context: the model's short-term memory
Everything the model can consider at one time, including your instructions, any reference material you paste in, and its own reply, lives in the context window. Think of it as a desk of fixed size. If a conversation or document grows beyond that size, the oldest material falls off the edge and the model effectively forgets it. The model has no memory between separate sessions unless a tool deliberately stores and re-supplies information. When a model seems to lose track of something you said much earlier, the cause is usually a context limit rather than carelessness.
Why outputs vary
At each step the model has a probability distribution over possible next tokens, and it samples from that distribution. This is why asking the same question twice can give two different answers. Many tools expose a setting often called temperature: lower values make the output more focused and repeatable, higher values make it more varied and creative. Variation is a feature for brainstorming and a hazard for tasks that need exact, consistent results, so match the setting to the job.
What this implies for you
- The model is predicting plausible text, not looking up verified facts. Plausible and true are not the same thing.
- Clear, specific input narrows the range of plausible continuations, which is why good prompting works.
- The model has no awareness of events after its training cutoff unless given current information, and it cannot truly know what it does not know.
Hold on to one sentence: the model is a very capable pattern continuer, not an oracle. Every later lesson is an application of that idea.