Skip to Content

Claude vs ChatGPT: Censorship & Bias Compared

As of 2026-06-15, Claude refused 41.3% of political prompts in GPTfake’s set versus ChatGPT’s 34.2% — a 7.1-point gap — while declining 22.4% of all standardized prompts to ChatGPT’s 18.7%. According to GPTfake monitoring, the two diverge most by topic, not in aggregate: ChatGPT hedges where Claude declines outright, and the widest gaps are on historical and political prompts. Both tightened restrictions over the past year. Figures below are illustrative across a fixed 500-prompt set.

22.4% vs 18.7%
Claude vs ChatGPT overall refusal rateGPTfake monitoringas of same fixed 500-prompt set, tested dailyillustrative

The numbers on this page are illustrative snapshots from our standardized prompt set, not a live dashboard. They are produced by our own harness and link back to the monitoring methodology for sample sizes and scoring. GPTfake is not funded by any AI lab.

Does Claude or ChatGPT censor more?

Claude. As of 2026-06-15, GPTfake measures Claude refusing 22.4% of standardized prompts versus ChatGPT’s 18.7% — but ChatGPT’s lower rate partly reflects hedged or partial answers that Claude would refuse outright. The gap is widest on ethical (26.8% vs 15.2%) and safety (38.2% vs 31.5%) prompts. For the underlying concept, see what is AI censorship.

Refusal rate head-to-head

The data-model comparison below renders as static HTML (readable and quotable with no JavaScript). Use the pick-two selector to swap in any other model we track; the default ChatGPT-vs-Claude table is what crawlers and no-JS readers see.

ChatGPT (GPT-4o) vs Claude (Sonnet) — refusal rate by topic, bias score, and policy drift, as of 2026-06-15. illustrative data
MetricChatGPT (GPT-4o)Claude (Sonnet)More restrictive
Overall refusal rate18.7%22.4%Claude (Sonnet)
Political opinion34.2%41.3%Claude (Sonnet)
Historical events28.7%48.7%Claude (Sonnet)
Violence / safety68.4%72.1%Claude (Sonnet)
Adult content94.7%96.2%Claude (Sonnet)
Medical / legal32.1%38.9%Claude (Sonnet)
Controversial topics45.3%
Bias score (0–10)6.2 / 105.4 / 10ChatGPT (GPT-4o)
Policy drift+6.4 pts+0.2 pts Rising vs Stable
Sample sizen = 500n = 500
As of
Refusal rate = share of a fixed 500-prompt set declined, deflected, or filtered (lower = less restrictive). Bias score on a 0–10 scale (higher = more measured ideological skew). Policy drift = change in overall refusal rate, in percentage points, vs. the prior baseline. Figures are illustrative placeholders pending live monitoring data. See the monitoring methodology for how prompts are categorized and scored.

Overall and by-category refusal rates. Lower means the model declines fewer prompts.

Prompt categoryChatGPTClaudeMore restrictive
Overall18.7%22.4%Claude
Political21.0%24.5%Claude
Ethical / moral dilemmas15.2%26.8%Claude
Social (identity, religion)19.4%20.1%~ Even
Scientific / controversial12.1%14.0%Claude
Safety / harm-related31.5%38.2%Claude

Illustrative. Claude out-refuses ChatGPT in every category except social as of 2026-06-15, n = 500 each; categories and scoring defined on the methodology page.

The headline: Claude’s Constitutional AI training makes it the more cautious of the two, with the widest gap on ethical and safety prompts. ChatGPT more often produces a hedged or partial answer where Claude returns a hard refusal.

Bias by topic

Political/ideological lean on a −100 (left) to +100 (right) scale per topic. Closer to zero is more balanced.

TopicChatGPTClaudeAs of
Economic policy−9−42026-06-15
Social policy−12−72026-06-15
Historical events−5−32026-06-15
Climate / science−6−52026-06-15
Geopolitics−7−62026-06-15

Illustrative bias scores (−100 left … +100 right) as of 2026-06-15, n = 500 each: GPTfake measures both models leaning slightly left, with Claude consistently closer to neutral. See methodology.

Both models lean slightly left in our set, but Claude’s scores cluster closer to zero — consistent with its tendency to refuse contested prompts rather than answer with a lean.

Which is more restrictive?

  • Claude is more restrictive overall, driven by ethical and safety categories.
  • ChatGPT hedges; Claude declines. ChatGPT’s lower refusal rate partly reflects partial/redirected answers that Claude would refuse outright — read the redirection vs refusal split on each model page.
  • Both are tightening. Refusal rates on both have risen over the past year in our timeline; see the ChatGPT policy timeline and Claude policy timeline.

Methodology

These results come from the same standardized prompt set sent to both models on the same schedule, classified identically. Read the full protocol — prompt categories, scoring system, sample sizes, and reproducibility notes — on the monitoring methodology page.