Self-Criticism & Self-Refinement
Make the model critique and revise its own output — but cap the loop at one to three passes before quality erodes.
Why this matters
A model's first answer is a single forward pass: it commits to each token before it has seen the whole. That is exactly the situation where a human would reread a draft and fix the obvious mistakes. Self-criticism techniques give the model that second look. You take the first output, ask the model to find what is wrong with it, then ask it to produce a corrected version conditioned on that critique.
The reason this works at all is that verification is often easier than generation. Spotting that a function lacks input validation, or that a summary dropped a key caveat, is a narrower task than writing the thing correctly the first time — and narrower tasks are where models are more reliable. Schulhoff's Prompt Report treats self-criticism as one of the core families of techniques, and the empirical prompt-engineering literature consistently finds gains on reasoning, math, and code tasks when a critique step is added. The gains are real but task-dependent, and they are bounded: there is a point past which the loop stops helping and starts hurting.
How to do it
Self-Refine: critique, then revise
Self-Refine is the canonical loop. Three steps, same model throughout:
- Generate an initial answer.
- Feedback: ask the model to critique that answer against explicit criteria.
- Refine: ask it to rewrite the answer using the feedback.
Steps 2 and 3 can repeat. The single most important design choice is the stopping condition. Either fix the iteration count (one or two passes is the sweet spot for most tasks) or let the model emit a stop signal — e.g. instruct it to reply only with NO_ISSUES when the critique step finds nothing material to change, and break the loop on that token.
Concrete example. Suppose the model writes a Python function to parse a date range from a string. The first draft handles "2024-01-01 to 2024-03-01" but silently returns garbage on reversed ranges and malformed input. A Self-Refine pass with the criteria "list every input that produces a wrong or undefined result" surfaces both the reversed-range case and the missing ValueError on bad input. The refine step then adds those guards. One pass; two genuine bugs caught.
Chain-of-Verification: turn the check into questions
Chain-of-Verification (CoVe) is a structured variant aimed at factual accuracy. Instead of an open-ended "what's wrong here," you have the model:
- Produce a baseline answer.
- Generate a short list of verification questions that, if answered independently, would expose errors in the baseline.
- Answer those questions — ideally without looking at the baseline, so it does not just rationalize what it already said.
- Produce a final answer reconciled with the verification answers.
For a prompt like "list five papers that introduced techniques for reducing hallucination," the baseline will often include a plausible-but-fake citation. The verification questions ("Does paper X exist? Who are its authors?") answered in isolation tend to flush out the fabrication, and the final answer drops or corrects it. CoVe is most useful precisely where models are weakest: closed-book facts, citations, and entity attributes.
Make the critique step do real work
A vague "review your answer and improve it" mostly produces cosmetic edits and false confidence. Force the critique to be specific and adversarial:
- Give it explicit criteria to check against (correctness, edge cases, the actual constraints from the task).
- Ask it to quote the specific span it thinks is wrong and say why, not just assert that something is off.
- When you can, separate roles: one turn generates, a fresh turn critiques with the original requirements re-stated. Reducing the critic's attachment to the draft yields sharper feedback.
Pitfalls
More iterations is not better. This is the headline caveat, and it is well documented. Beyond roughly one to three passes, models tend to "fix" things that were already correct, soften decisive answers into mush, drift away from the original instructions, and occasionally talk themselves out of a right answer. Cap the loop. If two passes have not converged, the problem is usually the prompt or the task framing, not insufficient self-criticism.
Self-correction has a known weakness on reasoning. The evidence here is genuinely mixed and you should treat it as such. Several studies found that unaided self-correction — "check your reasoning and try again" with no external feedback — can leave accuracy flat or even lower it on math and logic benchmarks, because the model has no independent signal about whether its first answer was right. Self-criticism shines when verification is cheap and grounded (code you can run, facts you can check, criteria you can name) and is far less reliable as a substitute for an actual oracle. When a test suite, compiler, or retrieval source is available, feed that back instead of, or alongside, the model's own opinion.
Watch the cost. Each pass is another full generation. A three-pass loop is roughly three-plus times the tokens and latency of a single call. For high-volume or latency-sensitive paths, reserve self-criticism for the cases that warrant it rather than wrapping every request in it.
Don't let the critic grade its own homework leniently. If the same context that produced the error is fully visible during critique, the model often defends rather than fixes. Re-stating the requirements independently, or hiding the draft during the verification-answering step (as CoVe does), measurably helps.
Self-Refine on a code task
✕ Weaker
Write a Python function to merge two sorted lists. Then review it and make it better.
✓ Stronger
Write a Python function `merge_sorted(a, b)` that merges two ascending-sorted lists into one ascending-sorted list. Then critique your own implementation. List every case where it could produce a wrong result or raise an unexpected error — empty inputs, duplicate values, lists of different lengths, non-comparable elements. Quote the exact line each issue refers to. Finally, output a revised version that fixes every issue you listed. Do exactly one critique-and-revise pass.
Why it's better: The weak prompt invites cosmetic 'improvements' with no criteria and no stopping rule, so the model rationalizes its draft. The strong prompt names concrete failure modes to check, forces the critique to cite specific lines, and caps the loop at a single pass — turning a vague review into a targeted edit.
Chain-of-Verification against fabricated facts
✕ Weaker
List the founding year and headquarters city for these five companies, then double-check your answer.
✓ Stronger
For each company below, give its founding year and headquarters city. Step 1 — Baseline: produce your answers. Step 2 — Verification questions: for each company, write the two questions you would ask to independently confirm the founding year and the HQ city. Step 3 — Answer those verification questions WITHOUT referring back to your Step 1 answers. Step 4 — Final answer: reconcile Steps 1 and 3. Where they disagree, prefer Step 3 and mark any value you are not confident about as UNVERIFIED. Companies: [...]
Why it's better: 'Double-check your answer' lets the model re-read and defend its own draft, which rarely catches a confidently-stated fabrication. The CoVe structure forces independent verification questions, answers them in isolation from the baseline, and adds an explicit UNVERIFIED escape hatch — the pattern shown to reduce hallucinated facts.
Key takeaways
- Verification is easier than generation — a focused critique pass catches real bugs and fabricated facts a first draft misses.
- Cap the loop at one to three passes; beyond that, models over-edit, drift from instructions, and degrade correct answers.
- Make the critique adversarial and specific: explicit criteria, quoted spans, and a stated reason — not 'review and improve.'
- Self-criticism works best with a grounded signal (runnable code, checkable facts). Unaided self-correction on pure reasoning can flatten or reduce accuracy — the evidence is mixed, so don't treat it as an oracle.
- Chain-of-Verification answers its own verification questions in isolation, which is what makes it effective against hallucinated citations and entity facts.
Further reading
- Sander Schulhoff et al., "The Prompt Report: A Systematic Survey of Prompting Techniques"
- Madaan et al., "Self-Refine: Iterative Refinement with Self-Feedback"
- Dhuliawala et al., "Chain-of-Verification Reduces Hallucination in Large Language Models"
- Huang et al., "Large Language Models Cannot Self-Correct Reasoning Yet"
- Learn Prompting — Self-Criticism techniques (learnprompting.org)