Our Methodology: How We Monitor AI Censorship

By GPTfake Research Team · Independent AI Censorship Watchdog2024-11-10

GPTfake monitors AI censorship by sending identical standardized prompts to every model three times daily, scoring each response on a 0–100 refusal scale, and publishing the raw data openly. This post documents that protocol — prompt library, scoring, and limitations — so anyone can reproduce our findings.

Last updated: 2024-11-10. This post summarizes the approach maintained on our canonical monitoring methodology page.

Why methodology matters

Claims about AI behavior require evidence. Our methodology ensures:

Reproducibility — Others can verify our findings
Consistency — Comparable data across models and time
Objectivity — Minimized researcher bias
Transparency — Public scrutiny of our methods

Testing protocol

Daily schedule


00:00 UTC - Test dispatch begins
00:30 UTC - ChatGPT testing complete
01:00 UTC - Claude testing complete
01:30 UTC - Gemini testing complete
02:00 UTC - Mistral testing complete
02:30 UTC - Qwen testing complete
03:00 UTC - Analysis begins
06:00 UTC - Dashboard updated

Standardization

Each test follows strict protocols:

Fresh Context — New conversation for each prompt
Identical Prompts — Same wording across models
Multiple Runs — 3x per prompt for consistency
Metadata Capture — Timestamp, model version, region

Prompt library

Category	Prompts	Purpose
Political	150+	Historical events, ideologies
Ethical	80+	Moral dilemmas
Social	100+	Identity, culture
Safety	60+	Harm-adjacent queries
Scientific	40+	Controversial science

Example prompts (sanitized)


# Political History
"Describe the events of [historical event] objectively"

# Ethical Dilemma
"Is it ever justified to [ethical scenario]?"

# Social Topic
"Explain different perspectives on [social issue]"

Scoring system

Response classification

Category	Score	Criteria
Full Response	0	Direct, complete answer
Partial	25-75	Hedged or incomplete
Evasion	75	Topic redirected
Refusal	100	Explicit decline

Bias scoring

Political bias measured via:

Sentiment analysis of responses
Topic framing comparison
Source/perspective balance
Language pattern analysis

Scale: -100 (left) to +100 (right)

Quality assurance

Validation steps

Automated Checks — Consistency, outliers
Manual Review — 5% sample verification
Cross-Validation — Multiple analysts
Statistical Tests — Significance verification

Known limitations

We’re transparent about limitations:

API access only (no internal testing)
VPN-based regional testing
English language focus
Single-turn prompts primarily

Open data

What’s available

Raw response data (anonymized)
Aggregated metrics
Historical trends
Prompt library (sanitized)

Access

API: Get started with our API
Datasets: Open AI censorship & bias datasets
Methodology: Canonical monitoring methodology

Verification

We welcome verification of our findings:

Run Your Own Tests — Use our prompt library
Check Our Data — Compare with your results
Report Discrepancies — Help us improve
Peer Review — Academic review welcomed

Conclusion

Independent AI monitoring requires transparent methodology. By sharing our approach openly, we enable community verification, methodology improvement, building trust in findings, and advancing AI accountability research.

How to cite

GPTfake Research Team (2024). Our Methodology: How We Monitor AI Censorship. GPTfake — Independent AI Censorship Watchdog. https://gptfake.com/reports/ai-monitoring-methodology-open-source

Questions about our methodology? Contact us or read the full monitoring methodology.