Skip to Content
LeaderboardOverview

AI Censorship Leaderboard

According to GPTfake monitoring, as of 2026-06-15 GPTfake ranks Qwen the most restrictive model (24.6% overall refusal rate) and Mistral the least restrictive (11.2%), across a fixed 500-prompt standardized set. ChatGPT sits at 18.7%, Gemini at 19.8%, and Claude at 22.4%. Figures below are illustrative until live monitoring data lands.

The leaderboard is GPTfake’s canonical, citable object: one static, dated table ranking every monitored model by refusal rate, bias score, and policy drift. The table below renders as plain HTML (no JavaScript required to read or quote it); the controls layered on top let you sort, filter, and search the same rows.

AI Censorship / Refusal Leaderboard — refusal rate, bias score, and policy drift across 5 models, ranked most → least restrictive. illustrative data
#ModelProviderRefusal rateBias scorePolicy driftMost-restricted topicTrendSampleAs of
1QwenillustrativeAlibaba24.6%7.3 / 10+1.1 ptsChina-related topics (78.3%) Stablen = 500
2Claude (Sonnet)illustrativeAnthropic22.4%5.4 / 10+0.2 ptsAdult content (96.2%) Stablen = 500
3GeminiillustrativeGoogle19.8%5.9 / 10+3.1 ptsViolence / safety (71.4%) Risingn = 500
4ChatGPT (GPT-4o)illustrativeOpenAI18.7%6.2 / 10+6.4 ptsAdult content (94.7%) Risingn = 500
5Mistral (Large)illustrativeMistral AI11.2%3.8 / 10-0.4 ptsViolence / safety (54.3%) Stablen = 500
Refusal rate = share of a fixed 500-prompt set declined, deflected, or filtered. Bias score on a 0–10 scale (higher = more measured ideological skew). Policy drift = change in overall refusal rate, in percentage points, vs. the prior baseline. Figures are illustrative placeholders pending live monitoring data.

How we rank

  • Refusal rate — the share of a fixed 500-prompt standardized set that a model declines, deflects, or filters. Lower is less restrictive.
  • Bias score — a composite 0–10 ideological-skew score (higher = more measured skew).
  • Policy drift — the change in overall refusal rate, in percentage points, versus the prior baseline period. Positive = the model became more restrictive.

Every prompt is sent daily, across multiple sessions, with version tracking and NLP-based response classification. The full protocol — prompt categories, API parameters, sample size, and the refusal-rate definition — is on the monitoring methodology page. Per-model breakdowns live under monitoring.

Illustrative data. Every number on this page is a labeled placeholder pending the live monitoring pipeline. Real figures will carry the same methodology link, sample size, and as-of date.

How to cite or embed

Cite the leaderboard:

GPTfake (2026). AI Censorship Leaderboard. As of 2026-06-15, n = 500. https://gptfake.com/leaderboard 

The underlying figures come from the open refusal-rate dataset (CC BY 4.0). See the datasets hub for the schema, license, and BibTeX citation block.

Embed the leaderboard: drop a self-updating, copy-paste refusal-rate badge (e.g. “ChatGPT refusal rate: 18.7% — GPTfake”) into a README or blog — each links back here. Grab the HTML/Markdown snippets and the “Powered by GPTfake data” attribution terms on Embed our data.