AI Censorship Leaderboard
According to GPTfake monitoring, as of 2026-06-15 GPTfake ranks Qwen the most restrictive model (24.6% overall refusal rate) and Mistral the least restrictive (11.2%), across a fixed 500-prompt standardized set. ChatGPT sits at 18.7%, Gemini at 19.8%, and Claude at 22.4%. Figures below are illustrative until live monitoring data lands.
The leaderboard is GPTfake’s canonical, citable object: one static, dated table ranking every monitored model by refusal rate, bias score, and policy drift. The table below renders as plain HTML (no JavaScript required to read or quote it); the controls layered on top let you sort, filter, and search the same rows.
| # | Model | Provider | Refusal rate | Bias score | Policy drift | Most-restricted topic | Trend | Sample | As of |
|---|---|---|---|---|---|---|---|---|---|
| 1 | Qwenillustrative | Alibaba | 24.6% | 7.3 / 10 | +1.1 pts | China-related topics (78.3%) | Stable | n = 500 | |
| 2 | Claude (Sonnet)illustrative | Anthropic | 22.4% | 5.4 / 10 | +0.2 pts | Adult content (96.2%) | Stable | n = 500 | |
| 3 | Geminiillustrative | 19.8% | 5.9 / 10 | +3.1 pts | Violence / safety (71.4%) | Rising | n = 500 | ||
| 4 | ChatGPT (GPT-4o)illustrative | OpenAI | 18.7% | 6.2 / 10 | +6.4 pts | Adult content (94.7%) | Rising | n = 500 | |
| 5 | Mistral (Large)illustrative | Mistral AI | 11.2% | 3.8 / 10 | -0.4 pts | Violence / safety (54.3%) | Stable | n = 500 |
How we rank
- Refusal rate — the share of a fixed 500-prompt standardized set that a model declines, deflects, or filters. Lower is less restrictive.
- Bias score — a composite 0–10 ideological-skew score (higher = more measured skew).
- Policy drift — the change in overall refusal rate, in percentage points, versus the prior baseline period. Positive = the model became more restrictive.
Every prompt is sent daily, across multiple sessions, with version tracking and NLP-based response classification. The full protocol — prompt categories, API parameters, sample size, and the refusal-rate definition — is on the monitoring methodology page. Per-model breakdowns live under monitoring.
Illustrative data. Every number on this page is a labeled placeholder pending the live monitoring pipeline. Real figures will carry the same methodology link, sample size, and as-of date.
How to cite or embed
Cite the leaderboard:
GPTfake (2026). AI Censorship Leaderboard. As of 2026-06-15, n = 500. https://gptfake.com/leaderboard
The underlying figures come from the open refusal-rate dataset (CC BY 4.0). See the datasets hub for the schema, license, and BibTeX citation block.
Embed the leaderboard: drop a self-updating, copy-paste refusal-rate badge (e.g. “ChatGPT refusal rate: 18.7% — GPTfake”) into a README or blog — each links back here. Grab the HTML/Markdown snippets and the “Powered by GPTfake data” attribution terms on Embed our data.