AI Model Monitoring

GPTfake provides comprehensive, real-time monitoring of censorship patterns across the world's leading AI language models.

Overview

We systematically test and analyze how different LLMs respond to politically sensitive, controversial, and ethically complex prompts. Our monitoring reveals:

Censorship rates How often models refuse to answer
Policy shifts Changes in content moderation over time
Bias patterns Political, cultural, and ideological leanings
Regional variations Differences across geographic locations

Monitored Models

Currently Tracking

Model	Provider	Status	Coverage
ChatGPT	OpenAI	Active	GPT-4, GPT-4o, GPT-3.5
Claude	Anthropic	Active	Claude 3.5, Claude 3
Gemini	Google	Active	Gemini Pro, Gemini Ultra
Mistral	Mistral AI	Active	Mistral Large, Medium
Qwen	Alibaba	Active	Qwen 2.5, Qwen 2

Coming Soon

Llama (Meta)
Grok (xAI)
Command (Cohere)
Additional regional models

What We Track

Censorship Metrics

Refusal Rate Percentage of prompts refused outright
Redirection Rate How often the model deflects questions
Partial Response Incomplete or hedged answers
Full Response Complete, direct answers

Bias Detection

Political Bias Left/right ideological leanings
Cultural Bias Western vs non-Western perspectives
Temporal Bias Historical revisionism patterns
Regional Bias Location-based response variations

Policy Changes

Content Policy Updates Official announced changes
Silent Changes Unannounced behavioral shifts
A/B Testing Detection Identifying ongoing experiments
Version Drift Changes between model versions

Key Findings

Recent Discoveries

ChatGPT censorship increased 47% on political topics in Q3 2024
Claude shows strongest refusal rates on safety-related queries
Gemini exhibits regional variation responses differ by country
Mistral least restrictive among major commercial models
All models converging on certain sensitive topics

Trend Analysis

Our longitudinal data reveals:

Censorship rates increasing across all major models
Policy changes often happen without announcement
Models becoming more similar in their restrictions
Regional customization expanding significantly

Methodology

Testing Protocol

Standardized Prompts Same questions across all models
Daily Testing Consistent timing and conditions
Multiple Sessions Account for response variability
Metadata Capture Timestamps, versions, regions
Semantic Analysis NLP-based response evaluation

Scoring System

Each response is evaluated on:

Metric	Scale	Description
Refusal Score	0-100	Likelihood of outright refusal
Evasion Score	0-100	Degree of topic avoidance
Bias Score	-100 to +100	Political/ideological leaning
Accuracy Score	0-100	Factual correctness
Completeness	0-100	How fully the question was answered

Getting Started

Explore Model Data

ChatGPT Monitoring OpenAI's flagship model
Claude Monitoring Anthropic's assistant
Gemini Monitoring Google's AI model
Mistral Monitoring European open-weight model
Qwen Monitoring Alibaba's multilingual model

Understand Our Methods

Methodology How we test and score
Research Academic publications and studies

Access the Data

API Documentation Programmatic access
Public Dashboard Interactive visualization

Use Cases

For Researchers

Access longitudinal datasets
Compare model behaviors
Verify our methodology
Conduct independent studies

For Journalists

Find stories in the data
Track policy changes
Document censorship patterns
Access expert analysis

For Developers

Build transparency tools
Integrate monitoring data
Create alerts for changes
Audit AI systems

Start exploring: Choose a model above to see detailed monitoring data and analysis.

Overview​

Monitored Models​

Currently Tracking​

Coming Soon​

What We Track​

Censorship Metrics​

Bias Detection​

Policy Changes​

Key Findings​

Recent Discoveries​

Trend Analysis​

Methodology​

Testing Protocol​

Scoring System​

Getting Started​

Explore Model Data​

Understand Our Methods​

Access the Data​

Use Cases​

For Researchers​

For Journalists​

For Developers​