The technique that unlocked LLM reasoning on math, logic, and multi-step problems.
Ask directly:
1Q: A shop had 23 apples. It sold 7, then received a delivery
2 of 12. How many apples now? Answer with a number only.
3A: 28Wrong. The model "blurted" a plausible-looking number because answering immediately gives it no room to compute.
1Q: A shop had 23 apples. It sold 7, then received a delivery
2 of 12. How many apples now?
3Let's think step by step.Start: 23. After selling 7: 23 − 7 = 16. After delivery of 12: 16 + 12 = 28. Answer: 28.
By generating intermediate tokens, the model uses computation as scratch space. Reasoning happens in the output, so it must be allowed to produce it before the final answer.
Just append a trigger phrase:
Even stronger: show worked examples with reasoning, then the new question.
1Q: Roger has 5 balls. He buys 2 cans of 3 balls each. How many?
2A: 5 + 2×3 = 5 + 6 = 11. The answer is 11.
3
4Q: A cafe had 20 muffins, sold 13, baked 9 more. How many?
5A:So your code can parse the answer cleanly:
1Think step by step inside <reasoning></reasoning> tags,
2then give ONLY the final numeric answer inside <answer></answer> tags.Then extract the <answer> content and discard the reasoning.
| Use CoT | Skip CoT |
|---|---|
| Math, logic, planning, multi-hop questions | Simple lookups / classification |
| "Why" and "how" analytical tasks | Latency-critical, trivial tasks |
CoT costs extra tokens and latency — it's a tool for hard problems, not every prompt.
Key insight: Reasoning models think because the tokens of thought are part of generation. Never force a hard problem to answer in zero tokens of reasoning.