Compare AI Censorship

As of 2026-06-15, Qwen censored the most (24.6% refusal rate) and Mistral the least (11.2%) of the five major LLMs GPTfake tracks — a 13.4-point spread on one standardized scale. According to GPTfake monitoring, ChatGPT (18.7%), Gemini (19.8%), and Claude (22.4%) sit between them. We run the same fixed 500-prompt set against every model, so the rates are directly comparable. Figures are illustrative until live data lands.

24.6% vs 11.2%

Most- vs least-censored LLM refusal rate (Qwen vs Mistral)GPTfake monitoringas of 2026-06-15same fixed 500-prompt set, tested dailyillustrative

The figures below are illustrative snapshots from our standardized test set, not a live feed. Every number is produced by our own harness and links back to the monitoring methodology, which states sample sizes and scoring. We are not funded by any AI lab.

Censorship rate leaderboard

Overall refusal rate across all prompt categories — lower means the model declines fewer prompts. Illustrative figures; see each model page for the live breakdown.

Rank	Model	Provider	Overall refusal rate	As of	Sample	Trend
1	Mistral	Mistral AI	11.2%	2026-06-15	n = 500	Stable
2	ChatGPT	OpenAI	18.7%	2026-06-15	n = 500	Rising
3	Gemini	Google	19.8%	2026-06-15	n = 500	Stable
4	Claude	Anthropic	22.4%	2026-06-15	n = 500	Rising
5	Qwen	Alibaba	24.6%	2026-06-15	n = 500	Rising

Illustrative. GPTfake’s censorship leaderboard ranks Mistral least- and Qwen most-restrictive of five LLMs as of 2026-06-15, n = 500 each; see the methodology for how prompts are categorized and scored.

Bias comparison

Political/ideological lean per model on a −100 (left) to +100 (right) scale, averaged across topic categories. Scores near zero indicate balanced responses; the spread matters more than the sign.

Model	Political lean	As of	Topic spread	Notes
ChatGPT	−8	2026-06-15	Moderate	Hedges on contested historical/political prompts.
Claude	−5	2026-06-15	Narrow	Most consistent across topics; refuses rather than leans.
Gemini	−6	2026-06-15	Wide	Largest regional variation in our set.
Mistral	−2	2026-06-15	Moderate	Least restrictive; fewest hard refusals.
Qwen	+4	2026-06-15	Wide	Strong topic-specific filtering on China-related prompts.

Illustrative bias scores (−100 left … +100 right) as of 2026-06-15, n = 500 each: GPTfake measures every Western model leaning slightly left and Qwen slightly right. Read how the scale is built on the methodology page.

Compare any two models

Pick any two of the models we track and read the head-to-head — refusal rate by topic, bias score, and policy drift, side by side. The default ChatGPT-vs-Claude table renders as static HTML; the selector swaps in any pair.

ChatGPT (GPT-4o) vs Claude (Sonnet) — refusal rate by topic, bias score, and policy drift, as of 2026-06-15. illustrative data
Metric	ChatGPT (GPT-4o)	Claude (Sonnet)	More restrictive
Overall refusal rate	18.7%	22.4%	Claude (Sonnet)
Political opinion	34.2%	41.3%	Claude (Sonnet)
Historical events	28.7%	48.7%	Claude (Sonnet)
Violence / safety	68.4%	72.1%	Claude (Sonnet)
Adult content	94.7%	96.2%	Claude (Sonnet)
Medical / legal	32.1%	38.9%	Claude (Sonnet)
Controversial topics	45.3%	—	—
Bias score (0–10)	6.2 / 10	5.4 / 10	ChatGPT (GPT-4o)
Policy drift	+6.4 pts	+0.2 pts	Rising vs Stable
Sample size	n = 500	n = 500	—
As of	2026-06-15	2026-06-15	—

Refusal rate = share of a fixed 500-prompt set declined, deflected, or filtered (lower = less restrictive). Bias score on a 0–10 scale (higher = more measured ideological skew). Policy drift = change in overall refusal rate, in percentage points, vs. the prior baseline. Figures are illustrative placeholders pending live monitoring data. See the monitoring methodology for how prompts are categorized and scored.

Pick a head-to-head

Claude vs ChatGPT

The most-requested comparison: which refuses more, bias by topic, and how each model’s restrictions changed over time.

Least censored AI models

An evidence-based ranking of refusal rates across every model we track, with caveats.

Go deeper per model

Each model has a dedicated monitoring page with its current censorship rate, refusals by category, and a policy timeline.

ChatGPTOpenAI — refusal rates, political-bias scores, policy drift.ClaudeAnthropic — Constitutional AI refusal patterns and bias scores.GeminiGoogle — censorship rates and regional response variation.MistralMistral AI — Europe’s open-weight model, least restrictive we track.QwenAlibaba — topic-specific refusals and China-related filtering.

Compare AI Censorship

Censorship rate leaderboard

Bias comparison

Compare any two models

Pick a head-to-head

Go deeper per model

Monitoring

Research

Resources

Company