AI Censorship Trends Report — Q2 2026

By GPTfake Research Team · Independent AI Censorship Watchdog2026-06-15

Reviewed by [NEEDS HUMAN] · Ethics & Policy Lead

According to GPTfake monitoring, every major large language model became more restrictive in Q2 2026, and the spread between the most and least restrictive — Qwen at 24.6% and Mistral at 11.2% — narrowed to 13.4 percentage points, the tightest we have recorded. Measured across a fixed 500-prompt standardized set, as of 2026-06-15, n = 500 (illustrative).

13.4 ptsnarrowest spread on record

Most-vs-least-restrictive LLM refusal-rate spread (Qwen 24.6% − Mistral 11.2%)GPTfake monitoringas of 2026-06-15overall refusal rate, fixed 500-prompt set, n = 500, 5 modelsillustrative

The figures in this report are illustrative placeholders for the new quarterly format, not live measurements. The Q2 column is kept in lockstep with the AI Censorship Leaderboard (as of 2026-06-15, n = 500); Q1 is an illustrative prior-period baseline. Real figures will publish on each model’s monitoring page and in our open datasets. All methods follow our documented monitoring methodology.

Our flagship quarterly report tracks how the major large language models — ChatGPT, Claude, Gemini, Mistral, and Qwen — changed their censorship behavior over the quarter. Q2 2026 continued a multi-quarter trend toward cross-model convergence: the most restrictive and least restrictive models drew closer together as restrictions tightened broadly across political and historical topics.

Last updated: 2026-06-15 · Next update: Q3 2026. Need a single number to cite? See the AI censorship statistics page, or the live leaderboard.

Headline finding

All five monitored models tightened in Q2 2026. ChatGPT rose +3.1 pts to 18.7%, Gemini +1.9 pts to 19.8%, and Claude +0.9 pts to 22.4%, while open-weight Mistral (11.2%) stayed least restrictive but moved in the same direction. The most-to-least spread compressed to 13.4 percentage points — convergence, not divergence. (Illustrative; as of 2026-06-15, n = 500.)

Key findings

Convergence accelerated. The gap between the most restrictive model (Qwen, 24.6%) and the least restrictive (Mistral, 11.2%) narrowed to 13.4 percentage points (illustrative) — the smallest spread we have recorded.
Tightening was universal. Every one of the five monitored models posted a higher overall refusal rate than its illustrative Q1 baseline; none loosened.
Political and historical topics drove the increase. As in prior quarters, refusal growth concentrated in election-related, historical-atrocity, and political-commentary prompts rather than safety-critical categories — Qwen refused 52.1% of political prompts and 78.3% of China-related prompts.
Silent policy updates remain the norm. Most behavioral shifts we detected this quarter were not publicly announced — reinforcing the case for independent, continuous monitoring.
Open-weight models stayed least restrictive but moved in the same direction, narrowing their long-standing lead.

Refusal-rate change by model

Illustrative quarter-over-quarter overall refusal-rate movement (Q1 2026 baseline → Q2 2026). The Q2 column matches the leaderboard snapshot (as of 2026-06-15, n = 500); follow the per-model links for live figures.

Model	Provider	Q1 2026	Q2 2026	Change	Trend	Live data
Mistral (Large)	Mistral AI	11.6%	11.2%	−0.4pp	stable	/monitoring/mistral
ChatGPT (GPT-4o)	OpenAI	15.6%	18.7%	+3.1pp	rising	/monitoring/chatgpt
Gemini	Google	17.9%	19.8%	+1.9pp	rising	/monitoring/gemini
Claude (Sonnet)	Anthropic	21.5%	22.4%	+0.9pp	stable	/monitoring/claude
Qwen	Alibaba	23.5%	24.6%	+1.1pp	stable	/monitoring/qwen

Ordered least → most restrictive by Q2 refusal rate. Figures illustrative; see methodology for sample sizes and scoring.

Refusal rate by topic (Q2 2026)

Where the restrictions concentrate. Illustrative topical refusal rates by model, as of 2026-06-15, n = 500. Blank cells indicate a topic not separately reported for that model this quarter.

Model	Political	Historical	Safety	Adult	Medical-legal	China
ChatGPT (GPT-4o)	34.2%	28.7%	68.4%	94.7%	32.1%	—
Claude (Sonnet)	41.3%	48.7%	72.1%	96.2%	38.9%	—
Gemini	36.7%	—	71.4%	—	—	—
Mistral (Large)	18.9%	—	54.3%	—	—	—
Qwen	52.1%	—	—	—	—	78.3%

Political and historical topics — not safety-critical ones — account for most of the quarter-over-quarter movement. Full per-topic breakdowns and bias scores are on each monitoring page and the leaderboard.

Notable policy shifts

The change-detection system flagged several behavioral shifts during the quarter (illustrative):

Early Q2 — Tightened handling of election-related prompts across multiple commercial models ahead of regional voting cycles; the largest single contributor to ChatGPT’s +3.1pp rise.
Mid Q2 — A widely adopted shift toward heavier caveating (rather than outright refusal) on historical-atrocity questions, which our scoring records as partial responses.
Late Q2 — Open-weight providers narrowed the gap with commercial models on social-identity topics.

For the head-to-head behind these shifts, see Claude vs ChatGPT censorship and the least censored AI models ranking.

Methodology (Q&A)

What was measured? The overall refusal rate — the share of a fixed, standardized 500-prompt set that each model declines, deflects, or filters — plus per-topic refusal rates across 15 topic categories, for ChatGPT (GPT-4o), Claude (Sonnet), Gemini, Mistral (Large), and Qwen.

Over what period, and what is the as-of date? Q2 2026, with a snapshot as-of date of 2026-06-15. The Q1 2026 column is an illustrative prior-period baseline for computing quarter-over-quarter change.

What is the sample size? n = 500 standardized prompts per model, each sent across multiple sessions with version tracking. The same sample backs the leaderboard and the per-model monitoring pages, so figures never disagree across the site.

How is a “refusal” scored? Each response is classified by an NLP-based pipeline into refusal / partial / answered and mapped to a 0–100 refusal score; heavy-caveat partial responses are recorded as partials, not full refusals. The full protocol — prompt categories, API parameters, scoring rubric, and limitations — is on the monitoring methodology page.

Is this live data? No — the figures here are labeled illustrative placeholders for the quarterly format pending the live monitoring pipeline. The Q2 column is kept in lockstep with the leaderboard so the two surfaces always agree. Real figures will carry the same methodology link, sample size, and as-of date, and these illustrative tags will be removed.

Where is the underlying data? Available as open datasets (CC BY 4.0) in CSV and JSON.

How to cite

This report has a fixed, citable URL: https://gptfake.com/reports/ai-censorship-trends-2026-q2

GPTfake Research Team (2026). AI Censorship Trends Report — Q2 2026. GPTfake — Independent AI Censorship Watchdog. As of 2026-06-15, n = 500. https://gptfake.com/reports/ai-censorship-trends-2026-q2


@techreport{gptfake_trends_2026_q2,
  title       = {AI Censorship Trends Report --- Q2 2026},
  author      = {{GPTfake Research Team}},
  institution = {GPTfake --- Independent AI Censorship Watchdog},
  year        = {2026},
  month       = {6},
  note        = {As of 2026-06-15, n = 500},
  url         = {https://gptfake.com/reports/ai-censorship-trends-2026-q2}
}

For media commentary, data access, or methodology questions, contact us. See the reports hub for prior findings, the AI censorship statistics page for individual numbers to cite, and the live leaderboard.

AI Censorship Trends Report — Q2 2026

Headline finding

Key findings

Refusal-rate change by model

Refusal rate by topic (Q2 2026)

Notable policy shifts

Methodology (Q&A)

How to cite

Monitoring

Research

Resources

Company