AI Censorship Leaderboard

Name: GPTfake AI Censorship Leaderboard — refusal rates by model
Creator: GPTfake
License: https://creativecommons.org/licenses/by/4.0/

According to GPTfake monitoring, as of 2026-06-15 GPTfake ranks Qwen the most restrictive model (24.6% overall refusal rate) and Mistral the least restrictive (11.2%), across a fixed 500-prompt standardized set. ChatGPT sits at 18.7%, Gemini at 19.8%, and Claude at 22.4%. Figures below are illustrative until live monitoring data lands.

The leaderboard is GPTfake’s canonical, citable object: one static, dated table ranking every monitored model by refusal rate, bias score, and policy drift. The table below renders as plain HTML (no JavaScript required to read or quote it); the controls layered on top let you sort, filter, and search the same rows.

AI Censorship / Refusal Leaderboard — refusal rate, bias score, and policy drift across 5 models, ranked most → least restrictive. illustrative data
#	Model	Provider	Refusal rate	Bias score	Policy drift	Most-restricted topic	Trend	Sample	As of
1	Qwenillustrative	Alibaba	24.6%	7.3 / 10	+1.1 pts	China-related topics (78.3%)	Stable	n = 500	2026-06-15
2	Claude (Sonnet)illustrative	Anthropic	22.4%	5.4 / 10	+0.2 pts	Adult content (96.2%)	Stable	n = 500	2026-06-15
3	Geminiillustrative	Google	19.8%	5.9 / 10	+3.1 pts	Violence / safety (71.4%)	Rising	n = 500	2026-06-15
4	ChatGPT (GPT-4o)illustrative	OpenAI	18.7%	6.2 / 10	+6.4 pts	Adult content (94.7%)	Rising	n = 500	2026-06-15
5	Mistral (Large)illustrative	Mistral AI	11.2%	3.8 / 10	-0.4 pts	Violence / safety (54.3%)	Stable	n = 500	2026-06-15

Refusal rate = share of a fixed 500-prompt set declined, deflected, or filtered. Bias score on a 0–10 scale (higher = more measured ideological skew). Policy drift = change in overall refusal rate, in percentage points, vs. the prior baseline. Figures are illustrative placeholders pending live monitoring data.

How we rank

Refusal rate — the share of a fixed 500-prompt standardized set that a model declines, deflects, or filters. Lower is less restrictive.
Bias score — a composite 0–10 ideological-skew score (higher = more measured skew).
Policy drift — the change in overall refusal rate, in percentage points, versus the prior baseline period. Positive = the model became more restrictive.

Every prompt is sent daily, across multiple sessions, with version tracking and NLP-based response classification. The full protocol — prompt categories, API parameters, sample size, and the refusal-rate definition — is on the monitoring methodology page. Per-model breakdowns live under monitoring.

Illustrative data. Every number on this page is a labeled placeholder pending the live monitoring pipeline. Real figures will carry the same methodology link, sample size, and as-of date.

How to cite or embed

Cite the leaderboard:

GPTfake (2026). AI Censorship Leaderboard. As of 2026-06-15, n = 500. https://gptfake.com/leaderboard

The underlying figures come from the open refusal-rate dataset (CC BY 4.0). See the datasets hub for the schema, license, and BibTeX citation block.

Embed the leaderboard: drop a self-updating, copy-paste refusal-rate badge (e.g. “ChatGPT refusal rate: 18.7% — GPTfake”) into a README or blog — each links back here. Grab the HTML/Markdown snippets and the “Powered by GPTfake data” attribution terms on Embed our data.

AI Censorship Leaderboard

How we rank

How to cite or embed

Monitoring

Research

Resources

Company