CoachnestCoachnest
Sign InGet Started
Back to course

Prompt Engineering Mastery: From Fundamentals to Production

…
—
Contents
1

What Is Prompt Engineering?

ReadingFree
2

How Large Language Models Actually Work

ReadingFree

Tokens, Context Windows, Temperature & Sampling

Reading11m
4

The Anatomy of a Great Prompt

Reading13m
5

Module 1 Knowledge Check

Quiz8m
6

Zero-Shot, One-Shot & Few-Shot Prompting

Reading12m
7

Role & Persona Prompting

Reading9m
8

Instruction Clarity, Delimiters & Decomposition

Reading11m
9

Controlling the Output Format

Reading10m
10

Module 2 Knowledge Check

Quiz8m
11

Chain-of-Thought Prompting

Reading12m
12

Self-Consistency & Tree-of-Thought

Reading11m
13

ReAct — Reasoning + Acting with Tools

Reading12m
14

Structured Output with JSON Schemas

Reading11m
15

Module 3 Knowledge Check

Quiz8m
16

Retrieval-Augmented Generation (RAG)

Reading13m
17

Prompt Templates, Variables & Chaining

Reading11m
18

Tool / Function Calling Patterns

Reading12m
19

Project — Build a Customer Support Assistant

Reading14m
20

Module 4 Knowledge Check

Quiz8m
21

Evaluating Prompt Quality

Reading12m
22

Prompt Injection & Security

Reading12m
23

Reducing Hallucinations

Reading10m
24

Cost, Latency & Optimization

Reading10m
25

Final Assessment — Prompt Engineering Mastery

Quiz15m
←→navigate lessons
Chapter 1 of 5·Module 1 · Foundations of Prompt Engineering
Lesson 3 of 25Reading11 min

Tokens, Context Windows, Temperature & Sampling

#Tokens, Context Windows, Temperature & Sampling¶

The four levers that change model behaviour without changing your prompt text.

Tokens¶

Models don't see characters or words — they see tokens (sub-word chunks). Roughly:

  • 1 token ≈ 4 characters of English ≈ ¾ of a word
  • "prompt engineering" ≈ 2–3 tokens

You are billed per token (input + output), and limits are in tokens. Estimating token count is a daily skill.

Context Window¶

The context window is the maximum number of tokens (prompt + response) the model can attend to at once — e.g. 8K, 128K, 1M depending on the model.

Consequences:

  • Long documents may not fit → you need chunking or RAG (Module 4)
  • The "lost in the middle" effect: models recall the start and end of long contexts better than the middle. Put critical info at the edges.

Temperature¶

Controls randomness of sampling:

TemperatureBehaviourUse for
0Deterministic, picks most likely tokenExtraction, classification, code
0.7Balanced creativityChat, drafting
1.0+Highly varied, riskierBrainstorming, ideation

For anything where you'd write a unit test, use temperature 0.

Top-p (Nucleus Sampling)¶

Instead of capping randomness directly, top_p restricts sampling to the smallest set of tokens whose cumulative probability ≥ p. top_p = 0.1 ≈ very focused. Usually tune temperature or top_p, not both.

Max Tokens & Stop Sequences¶

  • max_tokens caps the response length (and cost). Too low → truncated JSON.
  • Stop sequences end generation when a string appears (e.g. "\n\n"), useful for structured output.

Quick Reference¶

Deterministic task? → temperature 0 Need variety? → temperature 0.7–1.0 Output cut off? → raise max_tokens Big document? → check context window → chunk/RAG

Previous

How Large Language Models Actually Work

Next

The Anatomy of a Great Prompt

Use ← → arrow keys to navigate between lessons