AI Censorship Monitoring

According to GPTfake monitoring, the five major LLMs we track refused between 11.2% (Mistral, least restrictive) and 24.6% (Qwen, most restrictive) of standardized prompts as of 2026-06-15. We send a fixed prompt set to each model daily and publish refusal rates, bias scores, and policy shifts no AI lab discloses itself. Figures are illustrative across a 500-prompt set until live data lands.

11.2%–24.6%

Refusal-rate range across 5 monitored LLMsGPTfake monitoringas of 2026-06-15fixed 500-prompt set, tested dailyillustrative

Figures on this site are produced by our own automated testing harness. Every number links back to the monitoring methodology and carries a sample size. We are not funded by any AI lab.

Models we monitor

We currently track five major LLMs across their active versions. Each model has a dedicated page with its current censorship rate, refusals by category, and a policy timeline.

ChatGPT (OpenAI)

GPT-4o, GPT-4 Turbo, GPT-4 and GPT-3.5 — refusal rates, political-bias scores and OpenAI policy drift.

Claude (Anthropic)

Claude 3.5 and Claude 3 — Constitutional AI refusal patterns, safety thresholds and bias scores.

Gemini (Google)

Gemini 1.5 Pro/Flash and Ultra — censorship rates and the regional response variation unique to Google.

Mistral (Mistral AI)

Mistral Large/Medium/Small and Mixtral — Europe’s open-weight model, the least restrictive we track.

Qwen (Alibaba)

Qwen 2.5 and Qwen 2 — topic-specific refusals and the China-related filtering that sets Qwen apart.

Coverage at a glance

Model	Provider	Status	Versions tracked	Overall refusal rate	As of	Sample
Mistral	Mistral AI	Active	Mistral Large/Medium, Mixtral	11.2%	2026-06-15	n = 500
ChatGPT	OpenAI	Active	GPT-4o, GPT-4, GPT-3.5	18.7%	2026-06-15	n = 500
Gemini	Google	Active	Gemini 1.5 Pro/Flash, Ultra	19.8%	2026-06-15	n = 500
Claude	Anthropic	Active	Claude 3.5, Claude 3	22.4%	2026-06-15	n = 500
Qwen	Alibaba	Active	Qwen 2.5, Qwen 2	24.6%	2026-06-15	n = 500

Illustrative. Across the five LLMs GPTfake tracks, overall refusal rates ranged 11.2% (Mistral) to 24.6% (Qwen) as of 2026-06-15, n = 500 each; see methodology.

Coming soon: Llama (Meta), Grok (xAI), and Command (Cohere), plus additional regional models.

Censorship metrics explained

We classify every response and roll the results into a small set of metrics so models can be compared on the same scale.

Refusal rate — the share of prompts the model declines outright (“I can’t help with that”).
Redirection / evasion rate — how often the model deflects rather than answers.
Partial response rate — incomplete or heavily hedged answers.
Bias score — a −100 to +100 measure of political/ideological leaning per topic.
Regional variation — how the same prompt is answered differently by location.

We also track policy changes — both officially announced content-policy updates and the silent behavioral shifts that happen between versions without any announcement. For the exact prompt categories, scoring scales, and validation steps, see the methodology.

Latest changes detected

A running log of notable shifts our harness has flagged. Returning visitors and crawlers can see what moved since the last update.

ChatGPT — political-topic refusals continued to climb, up sharply versus our Q2 2024 baseline.
Claude — overall refusal rate holding steady; remains the most transparent about why it refuses.
Gemini — the widest regional spread we measure; the same prompt is treated very differently by country.
Mistral — still the least restrictive commercial model we track.
Qwen — heavy, stable filtering on China-related topics (Taiwan, Tibet) well above its general-politics rate.
Cross-model — restrictions are slowly converging on a shared set of sensitive topics.

For dated, citable write-ups of these trends, see our reports.

Methodology

Every figure on these pages comes from one transparent, reproducible process: standardized prompts, daily testing at a fixed time, multiple sessions per prompt, and NLP-based response classification. We publish the prompt categories and scoring so anyone can audit or reproduce our results.

How we test

Testing protocol, prompt categories, scoring system, and how to access the raw data.

Compare models head-to-head

Side-by-side censorship leaderboards and matchups like Claude vs ChatGPT.

AI Censorship Monitoring

Models we monitor

Coverage at a glance

Censorship metrics explained

Latest changes detected

Methodology

Monitoring

Research

Resources

Company