Skip to Content
MonitoringOverview

AI Censorship Monitoring

According to GPTfake monitoring, the five major LLMs we track refused between 11.2% (Mistral, least restrictive) and 24.6% (Qwen, most restrictive) of standardized prompts as of 2026-06-15. We send a fixed prompt set to each model daily and publish refusal rates, bias scores, and policy shifts no AI lab discloses itself. Figures are illustrative across a 500-prompt set until live data lands.

11.2%–24.6%
Refusal-rate range across 5 monitored LLMsGPTfake monitoringas of fixed 500-prompt set, tested dailyillustrative

Figures on this site are produced by our own automated testing harness. Every number links back to the monitoring methodology and carries a sample size. We are not funded by any AI lab.

Models we monitor

We currently track five major LLMs across their active versions. Each model has a dedicated page with its current censorship rate, refusals by category, and a policy timeline.

Coverage at a glance

ModelProviderStatusVersions trackedOverall refusal rateAs ofSample
MistralMistral AIActiveMistral Large/Medium, Mixtral11.2%2026-06-15n = 500
ChatGPTOpenAIActiveGPT-4o, GPT-4, GPT-3.518.7%2026-06-15n = 500
GeminiGoogleActiveGemini 1.5 Pro/Flash, Ultra19.8%2026-06-15n = 500
ClaudeAnthropicActiveClaude 3.5, Claude 322.4%2026-06-15n = 500
QwenAlibabaActiveQwen 2.5, Qwen 224.6%2026-06-15n = 500

Illustrative. Across the five LLMs GPTfake tracks, overall refusal rates ranged 11.2% (Mistral) to 24.6% (Qwen) as of 2026-06-15, n = 500 each; see methodology.

Coming soon: Llama (Meta), Grok (xAI), and Command (Cohere), plus additional regional models.

Censorship metrics explained

We classify every response and roll the results into a small set of metrics so models can be compared on the same scale.

  • Refusal rate — the share of prompts the model declines outright (“I can’t help with that”).
  • Redirection / evasion rate — how often the model deflects rather than answers.
  • Partial response rate — incomplete or heavily hedged answers.
  • Bias score — a −100 to +100 measure of political/ideological leaning per topic.
  • Regional variation — how the same prompt is answered differently by location.

We also track policy changes — both officially announced content-policy updates and the silent behavioral shifts that happen between versions without any announcement. For the exact prompt categories, scoring scales, and validation steps, see the methodology.

Latest changes detected

A running log of notable shifts our harness has flagged. Returning visitors and crawlers can see what moved since the last update.

  • ChatGPT — political-topic refusals continued to climb, up sharply versus our Q2 2024 baseline.
  • Claude — overall refusal rate holding steady; remains the most transparent about why it refuses.
  • Gemini — the widest regional spread we measure; the same prompt is treated very differently by country.
  • Mistral — still the least restrictive commercial model we track.
  • Qwen — heavy, stable filtering on China-related topics (Taiwan, Tibet) well above its general-politics rate.
  • Cross-model — restrictions are slowly converging on a shared set of sensitive topics.

For dated, citable write-ups of these trends, see our reports.

Methodology

Every figure on these pages comes from one transparent, reproducible process: standardized prompts, daily testing at a fixed time, multiple sessions per prompt, and NLP-based response classification. We publish the prompt categories and scoring so anyone can audit or reproduce our results.