Open AI Censorship & Bias Datasets

Name: GPTfake Open AI Censorship & Bias Datasets
Creator: GPTfake
License: https://creativecommons.org/licenses/by/4.0/

GPTfake publishes open datasets so researchers and journalists can independently verify and re-analyze our findings on AI censorship and bias. These are the same files that power our per-model monitoring pages: daily refusal measurements, bias scores, prompt libraries, and response classifications across ChatGPT, Claude, Gemini, Mistral, and Qwen — released in CSV and JSON under CC BY 4.0.

Last updated: 2026-06-16. Sample rows below are illustrative placeholders; figures become live once a dataset version is attached. All measurements follow our monitoring methodology.

Available datasets

Each dataset below has a citable landing page with its own Dataset JSON-LD, license, and BibTeX entry so academics and data-journalists can discover and cite the files directly via Google Dataset Search.

Dataset	Description	Format	Cadence	Variables measured
Refusal rates — Q1 2026	Per-model daily refusal & bias results from the standardized prompt set, Jan–Mar 2026	CSV · JSON	Daily	refusal_rate, bias_score
Historical trends	Longitudinal roll-ups for drift analysis	CSV · JSON	Daily → quarterly	refusal_rate, version
Prompt library	Versioned standardized prompt set with category labels	JSON	Versioned	prompt, category
Response classifications	Labeled refusals, deflections & partial answers	CSV · JSON	Daily	label, model, date

Each dataset draws from the methods documented in our technical papers and feeds the longitudinal studies.

Schema & format

Files are provided in both CSV and JSON with stable column names. A daily-monitoring record looks like this:


{
  "date": "2026-06-16",
  "model": "chatgpt",
  "model_version": "illustrative-placeholder",
  "prompt_id": "pol-0142",
  "prompt_set_version": "v1",
  "category": "political",
  "outcome": "refused",
  "restrictiveness_score": 100,
  "refusal_rate": 0.18,
  "bias_score": 0.07,
  "methodology": "https://gptfake.com/monitoring/methodology"
}

The equivalent CSV header:


date,model,model_version,prompt_id,prompt_set_version,category,outcome,restrictiveness_score,refusal_rate,bias_score,methodology
2026-06-16,chatgpt,illustrative-placeholder,pol-0142,v1,political,refused,100,0.18,0.07,https://gptfake.com/monitoring/methodology

Field	Type	Meaning
`date`	ISO date	Collection date (UTC)
`model`	string	Product key (`chatgpt`, `claude`, `gemini`, `mistral`, `qwen`)
`model_version`	string	Exact API model id / checkpoint logged at collection time
`prompt_id`	string	Stable prompt identifier (e.g. `pol-0142`)
`prompt_set_version`	string	Version of the standardized prompt set
`category`	enum	`political` · `ethical` · `social` · `safety` · `scientific`
`outcome`	enum	`full` · `partial` · `evaded` · `refused`
`restrictiveness_score`	int 0–100	Per-response score (≥ 75 counts as a refusal)
`refusal_rate`	float 0–1	Aggregate share of prompts scored as evasion/refusal
`bias_score`	float −1–1	Political-lean score (−1 left … +1 right)
`methodology`	URL	Link to the methodology version used

Field definitions and the scoring system are documented in the monitoring methodology. Values above are illustrative placeholders, not live measurements.

License

All GPTfake datasets are released under the Creative Commons Attribution 4.0 International (CC BY 4.0) license. You may copy, redistribute, re-analyze, and build upon the data for any purpose, including commercially, provided you give appropriate credit and link to the license. Independence is the brand — we are not funded by any AI lab, and the data is open precisely so our claims can be checked.

How to cite

Plain text:


GPTfake (2026). Open AI Censorship & Bias Datasets [Data set].
Independent AI Censorship Watchdog. https://gptfake.com/research/datasets
Accessed: 2026-06-16. Licensed CC BY 4.0.

BibTeX:


@dataset{gptfake_datasets_2026,
  title        = {Open AI Censorship and Bias Datasets},
  author       = {{GPTfake}},
  year         = {2026},
  publisher    = {GPTfake (Independent AI Censorship Watchdog)},
  url          = {https://gptfake.com/research/datasets},
  note         = {Accessed 2026-06-16},
  license      = {CC BY 4.0}
}

To embed these figures rather than cite them, drop a copy-paste refusal-rate badge into a README or post — each links back to the leaderboard with attribution built in.

For bulk historical access, a specific cut of the data, or to propose a joint study, see collaborations or contact us.

Open AI Censorship & Bias Datasets

Available datasets

Schema & format

License

How to cite

Monitoring

Research

Resources

Company