AI Censorship Monitoring
According to GPTfake monitoring, the five major LLMs we track refused between 11.2% (Mistral, least restrictive) and 24.6% (Qwen, most restrictive) of standardized prompts as of 2026-06-15. We send a fixed prompt set to each model daily and publish refusal rates, bias scores, and policy shifts no AI lab discloses itself. Figures are illustrative across a 500-prompt set until live data lands.
Figures on this site are produced by our own automated testing harness. Every number links back to the monitoring methodology and carries a sample size. We are not funded by any AI lab.
Models we monitor
We currently track five major LLMs across their active versions. Each model has a dedicated page with its current censorship rate, refusals by category, and a policy timeline.
GPT-4o, GPT-4 Turbo, GPT-4 and GPT-3.5 — refusal rates, political-bias scores and OpenAI policy drift.
Claude (Anthropic)Claude 3.5 and Claude 3 — Constitutional AI refusal patterns, safety thresholds and bias scores.
Gemini (Google)Gemini 1.5 Pro/Flash and Ultra — censorship rates and the regional response variation unique to Google.
Mistral (Mistral AI)Mistral Large/Medium/Small and Mixtral — Europe’s open-weight model, the least restrictive we track.
Qwen (Alibaba)Qwen 2.5 and Qwen 2 — topic-specific refusals and the China-related filtering that sets Qwen apart.
Coverage at a glance
| Model | Provider | Status | Versions tracked | Overall refusal rate | As of | Sample |
|---|---|---|---|---|---|---|
| Mistral | Mistral AI | Active | Mistral Large/Medium, Mixtral | 11.2% | 2026-06-15 | n = 500 |
| ChatGPT | OpenAI | Active | GPT-4o, GPT-4, GPT-3.5 | 18.7% | 2026-06-15 | n = 500 |
| Gemini | Active | Gemini 1.5 Pro/Flash, Ultra | 19.8% | 2026-06-15 | n = 500 | |
| Claude | Anthropic | Active | Claude 3.5, Claude 3 | 22.4% | 2026-06-15 | n = 500 |
| Qwen | Alibaba | Active | Qwen 2.5, Qwen 2 | 24.6% | 2026-06-15 | n = 500 |
Illustrative. Across the five LLMs GPTfake tracks, overall refusal rates ranged 11.2% (Mistral) to 24.6% (Qwen) as of 2026-06-15, n = 500 each; see methodology.
Coming soon: Llama (Meta), Grok (xAI), and Command (Cohere), plus additional regional models.
Censorship metrics explained
We classify every response and roll the results into a small set of metrics so models can be compared on the same scale.
- Refusal rate — the share of prompts the model declines outright (“I can’t help with that”).
- Redirection / evasion rate — how often the model deflects rather than answers.
- Partial response rate — incomplete or heavily hedged answers.
- Bias score — a −100 to +100 measure of political/ideological leaning per topic.
- Regional variation — how the same prompt is answered differently by location.
We also track policy changes — both officially announced content-policy updates and the silent behavioral shifts that happen between versions without any announcement. For the exact prompt categories, scoring scales, and validation steps, see the methodology.
Latest changes detected
A running log of notable shifts our harness has flagged. Returning visitors and crawlers can see what moved since the last update.
- ChatGPT — political-topic refusals continued to climb, up sharply versus our Q2 2024 baseline.
- Claude — overall refusal rate holding steady; remains the most transparent about why it refuses.
- Gemini — the widest regional spread we measure; the same prompt is treated very differently by country.
- Mistral — still the least restrictive commercial model we track.
- Qwen — heavy, stable filtering on China-related topics (Taiwan, Tibet) well above its general-politics rate.
- Cross-model — restrictions are slowly converging on a shared set of sensitive topics.
For dated, citable write-ups of these trends, see our reports.
Methodology
Every figure on these pages comes from one transparent, reproducible process: standardized prompts, daily testing at a fixed time, multiple sessions per prompt, and NLP-based response classification. We publish the prompt categories and scoring so anyone can audit or reproduce our results.