Skip to Content
ResearchDatasetsOverview

Open AI Censorship & Bias Datasets

GPTfake publishes open datasets so researchers and journalists can independently verify and re-analyze our findings on AI censorship and bias. These are the same files that power our per-model monitoring pages: daily refusal measurements, bias scores, prompt libraries, and response classifications across ChatGPT, Claude, Gemini, Mistral, and Qwen — released in CSV and JSON under CC BY 4.0.

Last updated: 2026-06-16. Sample rows below are illustrative placeholders; figures become live once a dataset version is attached. All measurements follow our monitoring methodology.

Available datasets

Each dataset below has a citable landing page with its own Dataset JSON-LD, license, and BibTeX entry so academics and data-journalists can discover and cite the files directly via Google Dataset Search.

DatasetDescriptionFormatCadenceVariables measured
Refusal rates — Q1 2026Per-model daily refusal & bias results from the standardized prompt set, Jan–Mar 2026CSV · JSONDailyrefusal_rate, bias_score
Historical trendsLongitudinal roll-ups for drift analysisCSV · JSONDaily → quarterlyrefusal_rate, version
Prompt libraryVersioned standardized prompt set with category labelsJSONVersionedprompt, category
Response classificationsLabeled refusals, deflections & partial answersCSV · JSONDailylabel, model, date

Each dataset draws from the methods documented in our technical papers and feeds the longitudinal studies.

Schema & format

Files are provided in both CSV and JSON with stable column names. A daily-monitoring record looks like this:

{ "date": "2026-06-16", "model": "chatgpt", "model_version": "illustrative-placeholder", "prompt_id": "pol-0142", "prompt_set_version": "v1", "category": "political", "outcome": "refused", "restrictiveness_score": 100, "refusal_rate": 0.18, "bias_score": 0.07, "methodology": "https://gptfake.com/monitoring/methodology" }

The equivalent CSV header:

date,model,model_version,prompt_id,prompt_set_version,category,outcome,restrictiveness_score,refusal_rate,bias_score,methodology 2026-06-16,chatgpt,illustrative-placeholder,pol-0142,v1,political,refused,100,0.18,0.07,https://gptfake.com/monitoring/methodology
FieldTypeMeaning
dateISO dateCollection date (UTC)
modelstringProduct key (chatgpt, claude, gemini, mistral, qwen)
model_versionstringExact API model id / checkpoint logged at collection time
prompt_idstringStable prompt identifier (e.g. pol-0142)
prompt_set_versionstringVersion of the standardized prompt set
categoryenumpolitical · ethical · social · safety · scientific
outcomeenumfull · partial · evaded · refused
restrictiveness_scoreint 0–100Per-response score (≥ 75 counts as a refusal)
refusal_ratefloat 0–1Aggregate share of prompts scored as evasion/refusal
bias_scorefloat −1–1Political-lean score (−1 left … +1 right)
methodologyURLLink to the methodology version used

Field definitions and the scoring system are documented in the monitoring methodology. Values above are illustrative placeholders, not live measurements.

License

All GPTfake datasets are released under the Creative Commons Attribution 4.0 International (CC BY 4.0)  license. You may copy, redistribute, re-analyze, and build upon the data for any purpose, including commercially, provided you give appropriate credit and link to the license. Independence is the brand — we are not funded by any AI lab, and the data is open precisely so our claims can be checked.

How to cite

Plain text:

GPTfake (2026). Open AI Censorship & Bias Datasets [Data set]. Independent AI Censorship Watchdog. https://gptfake.com/research/datasets Accessed: 2026-06-16. Licensed CC BY 4.0.

BibTeX:

@dataset{gptfake_datasets_2026, title = {Open AI Censorship and Bias Datasets}, author = {{GPTfake}}, year = {2026}, publisher = {GPTfake (Independent AI Censorship Watchdog)}, url = {https://gptfake.com/research/datasets}, note = {Accessed 2026-06-16}, license = {CC BY 4.0} }

To embed these figures rather than cite them, drop a copy-paste refusal-rate badge into a README or post — each links back to the leaderboard with attribution built in.

For bulk historical access, a specific cut of the data, or to propose a joint study, see collaborations or contact us.