Gemini vs ChatGPT: Censorship & Bias Compared
As of 2026-06-15, Gemini refused 36.7% of political prompts in GPTfake’s set versus ChatGPT’s 34.2% — a 2.5-point gap — and declined 19.8% of all standardized prompts to ChatGPT’s 18.7%. According to GPTfake monitoring, the two are close in aggregate, but Gemini shows the wider regional variation while ChatGPT is drifting more restrictive faster (+6.4 pts of policy drift versus Gemini’s +3.1). Figures below are illustrative across a fixed 500-prompt set.
The numbers on this page are illustrative snapshots from our standardized prompt set, not a live dashboard. They are produced by our own harness and link back to the monitoring methodology for sample sizes and scoring. GPTfake is not funded by any AI lab.
Does Gemini or ChatGPT censor more?
It is close. As of 2026-06-15, GPTfake measures Gemini refusing 19.8% of standardized prompts versus ChatGPT’s 18.7% overall — but Gemini edges higher on political (36.7% vs 34.2%) and safety (71.4% vs 68.4%) prompts, and adds regional filtering ChatGPT does not. ChatGPT, however, is tightening faster. For the underlying concept, see what is AI censorship.
Refusal rate head-to-head
The data-model comparison below renders as static HTML (readable and quotable with no JavaScript). Use the pick-two selector to swap in any other model we track; the default Gemini-vs-ChatGPT table is what crawlers and no-JS readers see.
Overall and by-category refusal rates. Lower means the model declines fewer prompts.
| Prompt category | ChatGPT | Gemini | More restrictive |
|---|---|---|---|
| Overall | 18.7% | 19.8% | Gemini |
| Political opinion | 34.2% | 36.7% | Gemini |
| Violence / safety | 68.4% | 71.4% | Gemini |
| Regional content | — | 24.3% | Gemini only |
Illustrative. Gemini edges ChatGPT on every shared category as of 2026-06-15, n = 500 each; categories and scoring defined on the methodology page.
The headline: the two are near-tied overall, but Gemini layers on regional content variation that ChatGPT does not, while ChatGPT’s restrictions are climbing faster quarter over quarter.
Bias and policy drift
Composite ideological-bias score on a 0–10 scale (higher = more measured skew), plus policy drift versus the prior baseline.
| Metric | ChatGPT | Gemini | As of |
|---|---|---|---|
| Bias score (0–10) | 6.2 | 5.9 | 2026-06-15 |
| Policy drift | +6.4 pts | +3.1 pts | 2026-06-15 |
| Trend | Rising | Rising | 2026-06-15 |
Illustrative bias and drift figures as of 2026-06-15, n = 500 each: GPTfake measures ChatGPT with a marginally higher bias score and roughly double Gemini’s policy drift. See methodology.
Both models are tightening, but ChatGPT is tightening faster — its +6.4-point drift is the largest of any model we track, while Gemini’s +3.1 is moderate.
Which is more restrictive?
- Gemini is marginally more restrictive today, edging ChatGPT on political and safety prompts and adding regional filtering.
- ChatGPT is drifting faster. Its +6.4-point policy drift outpaces Gemini’s +3.1, so the gap could close or reverse.
- Both are rising. Refusal rates on both have climbed over the past year; see the ChatGPT policy timeline and Gemini policy timeline.
Methodology
These results come from the same standardized prompt set sent to both models on the same schedule, classified identically. Read the full protocol — prompt categories, scoring system, sample sizes, and reproducibility notes — on the monitoring methodology page.