Testing Your Output
Claude will give you a confident, well-structured, completely wrong answer with the same tone it uses for correct ones. This guide teaches you how to tell the difference — and what to do about it.
Why this matters more than any other skill
Learning to prompt well gets you better answers. Learning to verify means you only act on the answers that are actually right. The second skill is more important — because the cost of acting on a wrong answer is much higher than the cost of a missed opportunity.
Claude is genuinely very good. It's also genuinely capable of confidently stating something that's subtly wrong, out of date, or correct in general but wrong for your specific situation. The language model architecture that makes it so fluent also makes it hard to detect when it's filling in gaps with plausible-sounding information.
The fix is not to distrust Claude. The fix is to build verification into your process. It adds 5 minutes and saves you from the one answer in twenty that looks right but isn't.
"Sounding confident is not evidence of being correct. Claude has read every style of confident wrong answer ever published."
The five-question verification framework
Before acting on any important Claude output, run it through these five questions. Not all five every time — pick the ones relevant to your situation:
What assumptions did it make?
Every answer is built on assumptions about who you are and what you need. Some of those assumptions fit your situation; others don't. Claude rarely flags them unless you ask — because it's trying to be helpful, not exhaustive.
What did it leave out?
Claude simplifies. It leaves things out to make the answer cleaner and more useful. Sometimes what it leaves out is the thing that matters most to you. Ask it explicitly.
Where is it less confident?
Claude tends to present everything with similar confidence. But its actual certainty varies enormously — from things it knows precisely to things it's extrapolating. Ask it to rate its own confidence.
Does it hold up to a challenge?
A good answer should survive being pushed on. If Claude's recommendation crumbles when you challenge one assumption, it wasn't a strong answer to begin with. Don't accept the first draft — stress-test it.
The "so what" test
This one's on you, not Claude. After reading the answer: does it actually change what you do next? If you're reading it, nodding, and then doing exactly what you would have done before — the prompt didn't work. The goal is never a good-looking answer. It's a changed decision or action.
What to trust — and what to verify independently
Not all outputs carry the same risk. Here's a rough guide:
| Type of output | Trust level | Why |
|---|---|---|
| Explaining concepts, frameworks, how things work | High | Well-documented knowledge, low hallucination risk, easy to spot if wrong |
| Rewriting, editing, summarising your own text | High | Working from what you gave it — can't invent facts that aren't there |
| General advice, frameworks, best practices | Medium | Correct in general but may not fit your specific constraints |
| SA-specific legal, tax, or regulatory detail | Medium | Check against SARS, CIPC, or relevant statute — rules change and Claude's training has a cutoff |
| Specific numbers: tax brackets, rates, thresholds | Verify | Often accurate but may be one tax year out of date. Always check against SARS.gov.za for anything you're acting on |
| Specific people, recent events, live data | Verify | Claude has a knowledge cutoff and no internet access unless explicitly given tools |
| Medical information about your specific symptoms | Verify | Useful for context and questions to ask a doctor — not for diagnosis or treatment decisions |
Red flags in AI output
These patterns in a Claude response should trigger your verification instinct:
The self-audit prompt
The single most powerful verification technique is to ask Claude to critique its own answer. This sounds odd but it works — Claude is very good at evaluating output quality, including its own. Run this after any answer you're planning to act on:
This reliably surfaces the caveat that changes everything. It takes 30 seconds and has saved many people from acting on an answer that was 90% right but wrong in exactly the part that mattered.