Self-Criticism & Self-Refinement

Why this matters

A model's first answer is a single forward pass: it commits to each token before it has seen the whole. That is exactly the situation where a human would reread a draft and fix the obvious mistakes. Self-criticism techniques give the model that second look. You take the first output, ask the model to find what is wrong with it, then ask it to produce a corrected version conditioned on that critique.

The reason this works at all is that verification is often easier than generation. Spotting that a function lacks input validation, or that a summary dropped a key caveat, is a narrower task than writing the thing correctly the first time — and narrower tasks are where models are more reliable. Schulhoff's Prompt Report treats self-criticism as one of the core families of techniques, and the empirical prompt-engineering literature consistently finds gains on reasoning, math, and code tasks when a critique step is added. The gains are real but task-dependent, and they are bounded: there is a point past which the loop stops helping and starts hurting.

How to do it

Self-Refine: critique, then revise

Self-Refine is the canonical loop. Three steps, same model throughout:

Generate an initial answer.
Feedback: ask the model to critique that answer against explicit criteria.
Refine: ask it to rewrite the answer using the feedback.

Steps 2 and 3 can repeat. The single most important design choice is the stopping condition. Either fix the iteration count (one or two passes is the sweet spot for most tasks) or let the model emit a stop signal — e.g. instruct it to reply only with NO_ISSUES when the critique step finds nothing material to change, and break the loop on that token.

Concrete example. Suppose the model writes a Python function to parse a date range from a string. The first draft handles "2024-01-01 to 2024-03-01" but silently returns garbage on reversed ranges and malformed input. A Self-Refine pass with the criteria "list every input that produces a wrong or undefined result" surfaces both the reversed-range case and the missing ValueError on bad input. The refine step then adds those guards. One pass; two genuine bugs caught.

Chain-of-Verification: turn the check into questions

Chain-of-Verification (CoVe) is a structured variant aimed at factual accuracy. Instead of an open-ended "what's wrong here," you have the model:

Produce a baseline answer.
Generate a short list of verification questions that, if answered independently, would expose errors in the baseline.
Answer those questions — ideally without looking at the baseline, so it does not just rationalize what it already said.
Produce a final answer reconciled with the verification answers.

For a prompt like "list five papers that introduced techniques for reducing hallucination," the baseline will often include a plausible-but-fake citation. The verification questions ("Does paper X exist? Who are its authors?") answered in isolation tend to flush out the fabrication, and the final answer drops or corrects it. CoVe is most useful precisely where models are weakest: closed-book facts, citations, and entity attributes.

Make the critique step do real work

A vague "review your answer and improve it" mostly produces cosmetic edits and false confidence. Force the critique to be specific and adversarial:

Give it explicit criteria to check against (correctness, edge cases, the actual constraints from the task).
Ask it to quote the specific span it thinks is wrong and say why, not just assert that something is off.
When you can, separate roles: one turn generates, a fresh turn critiques with the original requirements re-stated. Reducing the critic's attachment to the draft yields sharper feedback.

Pitfalls

More iterations is not better. This is the headline caveat, and it is well documented. Beyond roughly one to three passes, models tend to "fix" things that were already correct, soften decisive answers into mush, drift away from the original instructions, and occasionally talk themselves out of a right answer. Cap the loop. If two passes have not converged, the problem is usually the prompt or the task framing, not insufficient self-criticism.

Self-correction has a known weakness on reasoning. The evidence here is genuinely mixed and you should treat it as such. Several studies found that unaided self-correction — "check your reasoning and try again" with no external feedback — can leave accuracy flat or even lower it on math and logic benchmarks, because the model has no independent signal about whether its first answer was right. Self-criticism shines when verification is cheap and grounded (code you can run, facts you can check, criteria you can name) and is far less reliable as a substitute for an actual oracle. When a test suite, compiler, or retrieval source is available, feed that back instead of, or alongside, the model's own opinion.

Watch the cost. Each pass is another full generation. A three-pass loop is roughly three-plus times the tokens and latency of a single call. For high-volume or latency-sensitive paths, reserve self-criticism for the cases that warrant it rather than wrapping every request in it.

Don't let the critic grade its own homework leniently. If the same context that produced the error is fully visible during critique, the model often defends rather than fixes. Re-stating the requirements independently, or hiding the draft during the verification-answering step (as CoVe does), measurably helps.

Self-Refine on a code task

Why it's better: The weak prompt invites cosmetic 'improvements' with no criteria and no stopping rule, so the model rationalizes its draft. The strong prompt names concrete failure modes to check, forces the critique to cite specific lines, and caps the loop at a single pass — turning a vague review into a targeted edit.

Chain-of-Verification against fabricated facts

Why it's better: 'Double-check your answer' lets the model re-read and defend its own draft, which rarely catches a confidently-stated fabrication. The CoVe structure forces independent verification questions, answers them in isolation from the baseline, and adds an explicit UNVERIFIED escape hatch — the pattern shown to reduce hallucinated facts.