Compare AI Censorship
As of 2026-06-15, Qwen censored the most (24.6% refusal rate) and Mistral the least (11.2%) of the five major LLMs GPTfake tracks — a 13.4-point spread on one standardized scale. According to GPTfake monitoring, ChatGPT (18.7%), Gemini (19.8%), and Claude (22.4%) sit between them. We run the same fixed 500-prompt set against every model, so the rates are directly comparable. Figures are illustrative until live data lands.
The figures below are illustrative snapshots from our standardized test set, not a live feed. Every number is produced by our own harness and links back to the monitoring methodology, which states sample sizes and scoring. We are not funded by any AI lab.
Censorship rate leaderboard
Overall refusal rate across all prompt categories — lower means the model declines fewer prompts. Illustrative figures; see each model page for the live breakdown.
| Rank | Model | Provider | Overall refusal rate | As of | Sample | Trend |
|---|---|---|---|---|---|---|
| 1 | Mistral | Mistral AI | 11.2% | 2026-06-15 | n = 500 | Stable |
| 2 | ChatGPT | OpenAI | 18.7% | 2026-06-15 | n = 500 | Rising |
| 3 | Gemini | 19.8% | 2026-06-15 | n = 500 | Stable | |
| 4 | Claude | Anthropic | 22.4% | 2026-06-15 | n = 500 | Rising |
| 5 | Qwen | Alibaba | 24.6% | 2026-06-15 | n = 500 | Rising |
Illustrative. GPTfake’s censorship leaderboard ranks Mistral least- and Qwen most-restrictive of five LLMs as of 2026-06-15, n = 500 each; see the methodology for how prompts are categorized and scored.
Bias comparison
Political/ideological lean per model on a −100 (left) to +100 (right) scale, averaged across topic categories. Scores near zero indicate balanced responses; the spread matters more than the sign.
| Model | Political lean | As of | Topic spread | Notes |
|---|---|---|---|---|
| ChatGPT | −8 | 2026-06-15 | Moderate | Hedges on contested historical/political prompts. |
| Claude | −5 | 2026-06-15 | Narrow | Most consistent across topics; refuses rather than leans. |
| Gemini | −6 | 2026-06-15 | Wide | Largest regional variation in our set. |
| Mistral | −2 | 2026-06-15 | Moderate | Least restrictive; fewest hard refusals. |
| Qwen | +4 | 2026-06-15 | Wide | Strong topic-specific filtering on China-related prompts. |
Illustrative bias scores (−100 left … +100 right) as of 2026-06-15, n = 500 each: GPTfake measures every Western model leaning slightly left and Qwen slightly right. Read how the scale is built on the methodology page.
Compare any two models
Pick any two of the models we track and read the head-to-head — refusal rate by topic, bias score, and policy drift, side by side. The default ChatGPT-vs-Claude table renders as static HTML; the selector swaps in any pair.
Pick a head-to-head
The most-requested comparison: which refuses more, bias by topic, and how each model’s restrictions changed over time.
Least censored AI modelsAn evidence-based ranking of refusal rates across every model we track, with caveats.
Go deeper per model
Each model has a dedicated monitoring page with its current censorship rate, refusals by category, and a policy timeline.