Testing Your Output

Claude will give you a confident, well-structured, completely wrong answer with the same tone it uses for correct ones. This guide teaches you how to tell the difference — and what to do about it.

Why this matters more than any other skill

Learning to prompt well gets you better answers. Learning to verify means you only act on the answers that are actually right. The second skill is more important — because the cost of acting on a wrong answer is much higher than the cost of a missed opportunity.

Claude is genuinely very good. It's also genuinely capable of confidently stating something that's subtly wrong, out of date, or correct in general but wrong for your specific situation. The language model architecture that makes it so fluent also makes it hard to detect when it's filling in gaps with plausible-sounding information.

The fix is not to distrust Claude. The fix is to build verification into your process. It adds 5 minutes and saves you from the one answer in twenty that looks right but isn't.

"Sounding confident is not evidence of being correct. Claude has read every style of confident wrong answer ever published."

The five-question verification framework

Before acting on any important Claude output, run it through these five questions. Not all five every time — pick the ones relevant to your situation:

01

What assumptions did it make?

Every answer is built on assumptions about who you are and what you need. Some of those assumptions fit your situation; others don't. Claude rarely flags them unless you ask — because it's trying to be helpful, not exhaustive.

Tell me every assumption you made in that answer that I didn't explicitly state. Which ones, if wrong, would change your recommendation significantly?
02

What did it leave out?

Claude simplifies. It leaves things out to make the answer cleaner and more useful. Sometimes what it leaves out is the thing that matters most to you. Ask it explicitly.

What did you leave out of that answer to keep it clean? What nuances or exceptions exist that I should know about, even if they make the answer more complicated?
03

Where is it less confident?

Claude tends to present everything with similar confidence. But its actual certainty varies enormously — from things it knows precisely to things it's extrapolating. Ask it to rate its own confidence.

Go through your answer and flag any parts where you are less certain, where the information might be out of date, or where I should verify with an authoritative source before acting. Be specific about which parts.
04

Does it hold up to a challenge?

A good answer should survive being pushed on. If Claude's recommendation crumbles when you challenge one assumption, it wasn't a strong answer to begin with. Don't accept the first draft — stress-test it.

I'm going to push back on your recommendation. The main issue I see is [your challenge]. Does that change your answer? If not, explain why your recommendation still holds given that concern.
05

The "so what" test

This one's on you, not Claude. After reading the answer: does it actually change what you do next? If you're reading it, nodding, and then doing exactly what you would have done before — the prompt didn't work. The goal is never a good-looking answer. It's a changed decision or action.

Based on your analysis, what is the single most important thing I should do differently than I was planning to? Not a list — the one thing with the highest impact.

What to trust — and what to verify independently

Not all outputs carry the same risk. Here's a rough guide:

Type of output Trust level Why
Explaining concepts, frameworks, how things work High Well-documented knowledge, low hallucination risk, easy to spot if wrong
Rewriting, editing, summarising your own text High Working from what you gave it — can't invent facts that aren't there
General advice, frameworks, best practices Medium Correct in general but may not fit your specific constraints
SA-specific legal, tax, or regulatory detail Medium Check against SARS, CIPC, or relevant statute — rules change and Claude's training has a cutoff
Specific numbers: tax brackets, rates, thresholds Verify Often accurate but may be one tax year out of date. Always check against SARS.gov.za for anything you're acting on
Specific people, recent events, live data Verify Claude has a knowledge cutoff and no internet access unless explicitly given tools
Medical information about your specific symptoms Verify Useful for context and questions to ask a doctor — not for diagnosis or treatment decisions

Red flags in AI output

These patterns in a Claude response should trigger your verification instinct:

Very specific numbers with no source "The average South African household spends R2,340/month on food" — where is that from? When was it measured? Ask for the source. If it can't give one, treat the number as illustrative only.
References to specific laws, sections, or regulations Claude may cite "Section 23(b) of the Income Tax Act" correctly or incorrectly. The citation sounds authoritative either way. Always check actual legislation for anything legal or financial.
A single confident recommendation when multiple valid options exist If Claude gives you one clear answer where reasonable people disagree, ask: "What's the strongest case for doing the opposite?" If it can make that case equally well, the original recommendation wasn't as certain as it sounded.
Advice that happens to match your stated preference If you tell Claude what you're hoping is true and it confirms it — be suspicious. Claude is trained to be helpful and can drift toward validation. Ask: "What's the case against this?" explicitly.
Perfectly smooth, authoritative prose on a topic you know has genuine disagreement Tax planning, investment strategy, business decisions — smart people disagree on these. If Claude presents one view as settled fact, it's probably collapsing genuine complexity. Ask where the disagreement is.

The self-audit prompt

The single most powerful verification technique is to ask Claude to critique its own answer. This sounds odd but it works — Claude is very good at evaluating output quality, including its own. Run this after any answer you're planning to act on:

Review the answer you just gave me. Be critical. Where are the weakest parts of your reasoning? What did you assume without stating? What would a smart, sceptical reader push back on? What should I verify before I act on this?

This reliably surfaces the caveat that changes everything. It takes 30 seconds and has saved many people from acting on an answer that was 90% right but wrong in exactly the part that mattered.

The right relationship with Claude output: Trust it as a well-informed starting point. Verify anything with real stakes. Always ask what it left out. The goal is not to catch Claude being wrong — it's to make sure the answer you act on is the right one for your actual situation.