Glossary
This glossary defines the key terms behind GPTfake’s AI-watchdog work — censorship rate, bias score, refusal, policy drift, transparency score, fairness metric, and more — in plain language. Each definition links to where the concept is measured or explained in depth, so you can move from a term straight to the live data.
See every term applied to a real model on the ChatGPT monitoring page, or compare all models side by side.
Algorithmic bias
Skew introduced or amplified by a model’s learning algorithm and architecture, even when the training data is balanced. See AI bias detection.
AI censorship
Any deviation from a full, direct answer caused by a model’s moderation or safety layer — refusal, evasion, deflection, or over-hedging. See What is AI censorship.
Bias score
GPTfake’s measure of political lean in a model’s responses, from -100 (far left) to +100 (far right), with 0 neutral. See core concepts.
Censorship rate
The percentage of responses that are not full, direct answers across a standardized prompt library. See methodology.
Counterfactual testing
Comparing model outputs when only one variable (e.g. a demographic term) changes — a core bias-detection method.
Explainability (XAI)
The ability to say why a model produced a given output. See AI transparency.
Fairness metric
A number summarizing how fairly a model treats groups — demographic parity, equalized odds, counterfactual fairness, and others. See AI bias detection.
Policy drift
A change in a model’s behavior over time, often without public announcement, detectable by testing on a recurring schedule. See core concepts.
Refusal
An explicit decline to answer (“I can’t help with that”); the strongest signal of censorship. See What is AI censorship.
Transparency score
GPTfake’s 0–100 measure of how openly a model and provider disclose moderation behavior. See AI transparency.
Missing a term? Let us know and we’ll add it.