#Tokens, Context Windows, Temperature & Sampling¶

The four levers that change model behaviour without changing your prompt text.

Tokens¶

Models don't see characters or words — they see tokens (sub-word chunks). Roughly:

1 token ≈ 4 characters of English ≈ ¾ of a word
"prompt engineering" ≈ 2–3 tokens

You are billed per token (input + output), and limits are in tokens. Estimating token count is a daily skill.

Context Window¶

The context window is the maximum number of tokens (prompt + response) the model can attend to at once — e.g. 8K, 128K, 1M depending on the model.

Consequences:

Long documents may not fit → you need chunking or RAG (Module 4)
The "lost in the middle" effect: models recall the start and end of long contexts better than the middle. Put critical info at the edges.

Temperature¶

Controls randomness of sampling:

Temperature	Behaviour	Use for
`0`	Deterministic, picks most likely token	Extraction, classification, code
`0.7`	Balanced creativity	Chat, drafting
`1.0+`	Highly varied, riskier	Brainstorming, ideation

For anything where you'd write a unit test, use temperature 0.

Top-p (Nucleus Sampling)¶

Instead of capping randomness directly, top_p restricts sampling to the smallest set of tokens whose cumulative probability ≥ p. top_p = 0.1 ≈ very focused. Usually tune temperature or top_p, not both.

Max Tokens & Stop Sequences¶

max_tokens caps the response length (and cost). Too low → truncated JSON.
Stop sequences end generation when a string appears (e.g. "\n\n"), useful for structured output.

Quick Reference¶

Deterministic task?  → temperature 0
Need variety?        → temperature 0.7–1.0
Output cut off?      → raise max_tokens
Big document?        → check context window → chunk/RAG

Context Window¶

The context window is the maximum number of tokens (prompt + response) the model can attend to at once — e.g. 8K, 128K, 1M depending on the model.

Consequences:

Long documents may not fit → you need chunking or RAG (Module 4)

The "lost in the middle" effect: models recall the start and end of long contexts better than the middle. Put critical info at the edges.

Temperature

Behaviour

Use for

0

Deterministic, picks most likely token

Extraction, classification, code

0.7

Balanced creativity

Chat, drafting

1.0+

Highly varied, riskier

Brainstorming, ideation