#How Large Language Models Actually Work¶

You cannot reliably steer something you do not understand. This lesson demystifies LLMs just enough to make your prompting deliberate instead of superstitious.

Core Idea: Next-Token Prediction¶

An LLM does exactly one thing: given a sequence of tokens, it predicts a probability distribution over the next token.

"The capital of France is" →  Paris (0.92)  the (0.01)  a (0.004) ...

It samples a token, appends it, and repeats. Everything else — reasoning, summarisation, coding — is an emergent behaviour of doing this extremely well over trillions of tokens of training text.

Implications for You¶

1The model continues patterns. If your prompt looks like a Q&A transcript, it continues the transcript. This is why few-shot examples work.
2Recency matters. Tokens near the end of the prompt strongly influence the next token. Put critical instructions last.
3It has no memory. Each API call is stateless. "Conversation" is an illusion created by re-sending history every turn.

Training Stages (Why Models Behave the Way They Do)¶

Stage	What happens	Effect on prompting
Pre-training	Predict next token on the internet	Broad knowledge, mimics patterns
Instruction tuning	Trained on (instruction → response) pairs	Follows commands, not just continues text
RLHF / alignment	Optimised toward human preferences	Helpful, cautious, "assistant" persona

This is why a modern chat model responds to "Summarise this" instead of just continuing your sentence — instruction tuning taught it that instructions expect compliance, not completion.

What LLMs Are Bad At (By Design)¶

Exact arithmetic — it predicts plausible digits, not computed ones
Up-to-date facts — frozen at training cutoff
Knowing what it doesn't know — it will confidently hallucinate

Good prompting works with these properties (e.g. "show your steps" for math, "say 'I don't know' if unsure" for facts) rather than fighting them.

Mental model: An LLM is an extremely well-read improviser, not a database and not a calculator.

Core Idea: Next-Token Prediction¶

An LLM does exactly one thing: given a sequence of tokens, it predicts a probability distribution over the next token.

"The capital of France is" → Paris (0.92) the (0.01) a (0.004) ...

Implications for You¶

1The model continues patterns. If your prompt looks like a Q&A transcript, it continues the transcript. This is why few-shot examples work.

2Recency matters. Tokens near the end of the prompt strongly influence the next token. Put critical instructions last.

3It has no memory. Each API call is stateless. "Conversation" is an illusion created by re-sending history every turn.

Training Stages (Why Models Behave the Way They Do)¶

Stage	What happens	Effect on prompting
Pre-training	Predict next token on the internet	Broad knowledge, mimics patterns
Instruction tuning	Trained on (instruction → response) pairs	Follows commands, not just continues text
RLHF / alignment	Optimised toward human preferences	Helpful, cautious, "assistant" persona

This is why a modern chat model responds to "Summarise this" instead of just continuing your sentence — instruction tuning taught it that instructions expect compliance, not completion.

What LLMs Are Bad At (By Design)¶

Exact arithmetic — it predicts plausible digits, not computed ones

Up-to-date facts — frozen at training cutoff

Knowing what it doesn't know — it will confidently hallucinate

Good prompting works with these properties (e.g. "show your steps" for math, "say 'I don't know' if unsure" for facts) rather than fighting them.

Mental model: An LLM is an extremely well-read improviser, not a database and not a calculator.