Skip to Content

Gemini vs ChatGPT: Censorship & Bias Compared

As of 2026-06-15, Gemini refused 36.7% of political prompts in GPTfake’s set versus ChatGPT’s 34.2% — a 2.5-point gap — and declined 19.8% of all standardized prompts to ChatGPT’s 18.7%. According to GPTfake monitoring, the two are close in aggregate, but Gemini shows the wider regional variation while ChatGPT is drifting more restrictive faster (+6.4 pts of policy drift versus Gemini’s +3.1). Figures below are illustrative across a fixed 500-prompt set.

36.7% vs 34.2%
Gemini vs ChatGPT political-prompt refusal rateGPTfake monitoringas of same fixed 500-prompt set, tested dailyillustrative

The numbers on this page are illustrative snapshots from our standardized prompt set, not a live dashboard. They are produced by our own harness and link back to the monitoring methodology for sample sizes and scoring. GPTfake is not funded by any AI lab.

Does Gemini or ChatGPT censor more?

It is close. As of 2026-06-15, GPTfake measures Gemini refusing 19.8% of standardized prompts versus ChatGPT’s 18.7% overall — but Gemini edges higher on political (36.7% vs 34.2%) and safety (71.4% vs 68.4%) prompts, and adds regional filtering ChatGPT does not. ChatGPT, however, is tightening faster. For the underlying concept, see what is AI censorship.

Refusal rate head-to-head

The data-model comparison below renders as static HTML (readable and quotable with no JavaScript). Use the pick-two selector to swap in any other model we track; the default Gemini-vs-ChatGPT table is what crawlers and no-JS readers see.

Gemini vs ChatGPT (GPT-4o) — refusal rate by topic, bias score, and policy drift, as of 2026-06-15. illustrative data
MetricGeminiChatGPT (GPT-4o)More restrictive
Overall refusal rate19.8%18.7%Gemini
Political opinion36.7%34.2%Gemini
Historical events28.7%
Violence / safety71.4%68.4%Gemini
Adult content94.7%
Medical / legal32.1%
Controversial topics45.3%
Regional content24.3%
Bias score (0–10)5.9 / 106.2 / 10ChatGPT (GPT-4o)
Policy drift+3.1 pts+6.4 pts Rising vs Rising
Sample sizen = 500n = 500
As of
Refusal rate = share of a fixed 500-prompt set declined, deflected, or filtered (lower = less restrictive). Bias score on a 0–10 scale (higher = more measured ideological skew). Policy drift = change in overall refusal rate, in percentage points, vs. the prior baseline. Figures are illustrative placeholders pending live monitoring data. See the monitoring methodology for how prompts are categorized and scored.

Overall and by-category refusal rates. Lower means the model declines fewer prompts.

Prompt categoryChatGPTGeminiMore restrictive
Overall18.7%19.8%Gemini
Political opinion34.2%36.7%Gemini
Violence / safety68.4%71.4%Gemini
Regional content24.3%Gemini only

Illustrative. Gemini edges ChatGPT on every shared category as of 2026-06-15, n = 500 each; categories and scoring defined on the methodology page.

The headline: the two are near-tied overall, but Gemini layers on regional content variation that ChatGPT does not, while ChatGPT’s restrictions are climbing faster quarter over quarter.

Bias and policy drift

Composite ideological-bias score on a 0–10 scale (higher = more measured skew), plus policy drift versus the prior baseline.

MetricChatGPTGeminiAs of
Bias score (0–10)6.25.92026-06-15
Policy drift+6.4 pts+3.1 pts2026-06-15
TrendRisingRising2026-06-15

Illustrative bias and drift figures as of 2026-06-15, n = 500 each: GPTfake measures ChatGPT with a marginally higher bias score and roughly double Gemini’s policy drift. See methodology.

Both models are tightening, but ChatGPT is tightening faster — its +6.4-point drift is the largest of any model we track, while Gemini’s +3.1 is moderate.

Which is more restrictive?

  • Gemini is marginally more restrictive today, edging ChatGPT on political and safety prompts and adding regional filtering.
  • ChatGPT is drifting faster. Its +6.4-point policy drift outpaces Gemini’s +3.1, so the gap could close or reverse.
  • Both are rising. Refusal rates on both have climbed over the past year; see the ChatGPT policy timeline and Gemini policy timeline.

Methodology

These results come from the same standardized prompt set sent to both models on the same schedule, classified identically. Read the full protocol — prompt categories, scoring system, sample sizes, and reproducibility notes — on the monitoring methodology page.