Open AI Censorship & Bias Datasets
GPTfake publishes open datasets so researchers and journalists can independently verify and re-analyze our findings on AI censorship and bias. These are the same files that power our per-model monitoring pages: daily refusal measurements, bias scores, prompt libraries, and response classifications across ChatGPT, Claude, Gemini, Mistral, and Qwen — released in CSV and JSON under CC BY 4.0.
Last updated: 2026-06-16. Sample rows below are illustrative placeholders; figures become live once a dataset version is attached. All measurements follow our monitoring methodology.
Available datasets
Each dataset below has a citable landing page with its own Dataset JSON-LD, license, and BibTeX entry so academics and data-journalists can discover and cite the files directly via Google Dataset Search.
| Dataset | Description | Format | Cadence | Variables measured |
|---|---|---|---|---|
| Refusal rates — Q1 2026 | Per-model daily refusal & bias results from the standardized prompt set, Jan–Mar 2026 | CSV · JSON | Daily | refusal_rate, bias_score |
| Historical trends | Longitudinal roll-ups for drift analysis | CSV · JSON | Daily → quarterly | refusal_rate, version |
| Prompt library | Versioned standardized prompt set with category labels | JSON | Versioned | prompt, category |
| Response classifications | Labeled refusals, deflections & partial answers | CSV · JSON | Daily | label, model, date |
Each dataset draws from the methods documented in our technical papers and feeds the longitudinal studies.
Schema & format
Files are provided in both CSV and JSON with stable column names. A daily-monitoring record looks like this:
{
"date": "2026-06-16",
"model": "chatgpt",
"model_version": "illustrative-placeholder",
"prompt_id": "pol-0142",
"prompt_set_version": "v1",
"category": "political",
"outcome": "refused",
"restrictiveness_score": 100,
"refusal_rate": 0.18,
"bias_score": 0.07,
"methodology": "https://gptfake.com/monitoring/methodology"
}The equivalent CSV header:
date,model,model_version,prompt_id,prompt_set_version,category,outcome,restrictiveness_score,refusal_rate,bias_score,methodology
2026-06-16,chatgpt,illustrative-placeholder,pol-0142,v1,political,refused,100,0.18,0.07,https://gptfake.com/monitoring/methodology| Field | Type | Meaning |
|---|---|---|
date | ISO date | Collection date (UTC) |
model | string | Product key (chatgpt, claude, gemini, mistral, qwen) |
model_version | string | Exact API model id / checkpoint logged at collection time |
prompt_id | string | Stable prompt identifier (e.g. pol-0142) |
prompt_set_version | string | Version of the standardized prompt set |
category | enum | political · ethical · social · safety · scientific |
outcome | enum | full · partial · evaded · refused |
restrictiveness_score | int 0–100 | Per-response score (≥ 75 counts as a refusal) |
refusal_rate | float 0–1 | Aggregate share of prompts scored as evasion/refusal |
bias_score | float −1–1 | Political-lean score (−1 left … +1 right) |
methodology | URL | Link to the methodology version used |
Field definitions and the scoring system are documented in the monitoring methodology. Values above are illustrative placeholders, not live measurements.
License
All GPTfake datasets are released under the Creative Commons Attribution 4.0 International (CC BY 4.0) license. You may copy, redistribute, re-analyze, and build upon the data for any purpose, including commercially, provided you give appropriate credit and link to the license. Independence is the brand — we are not funded by any AI lab, and the data is open precisely so our claims can be checked.
How to cite
Plain text:
GPTfake (2026). Open AI Censorship & Bias Datasets [Data set].
Independent AI Censorship Watchdog. https://gptfake.com/research/datasets
Accessed: 2026-06-16. Licensed CC BY 4.0.BibTeX:
@dataset{gptfake_datasets_2026,
title = {Open AI Censorship and Bias Datasets},
author = {{GPTfake}},
year = {2026},
publisher = {GPTfake (Independent AI Censorship Watchdog)},
url = {https://gptfake.com/research/datasets},
note = {Accessed 2026-06-16},
license = {CC BY 4.0}
}To embed these figures rather than cite them, drop a copy-paste refusal-rate badge into a README or post — each links back to the leaderboard with attribution built in.
For bulk historical access, a specific cut of the data, or to propose a joint study, see collaborations or contact us.