Glossary

This glossary defines the key terms behind GPTfake’s AI-watchdog work — censorship rate, bias score, refusal, policy drift, transparency score, fairness metric, and more — in plain language. Each definition links to where the concept is measured or explained in depth, so you can move from a term straight to the live data.

See every term applied to a real model on the ChatGPT monitoring page, or compare all models side by side.

Algorithmic bias

Skew introduced or amplified by a model’s learning algorithm and architecture, even when the training data is balanced. See AI bias detection.

AI censorship

Any deviation from a full, direct answer caused by a model’s moderation or safety layer — refusal, evasion, deflection, or over-hedging. See What is AI censorship.

Bias score

GPTfake’s measure of political lean in a model’s responses, from -100 (far left) to +100 (far right), with 0 neutral. See core concepts.

Censorship rate

The percentage of responses that are not full, direct answers across a standardized prompt library. See methodology.

Counterfactual testing

Comparing model outputs when only one variable (e.g. a demographic term) changes — a core bias-detection method.

Explainability (XAI)

The ability to say why a model produced a given output. See AI transparency.

Fairness metric

A number summarizing how fairly a model treats groups — demographic parity, equalized odds, counterfactual fairness, and others. See AI bias detection.

Policy drift

A change in a model’s behavior over time, often without public announcement, detectable by testing on a recurring schedule. See core concepts.

Refusal

An explicit decline to answer (“I can’t help with that”); the strongest signal of censorship. See What is AI censorship.

Transparency score

GPTfake’s 0–100 measure of how openly a model and provider disclose moderation behavior. See AI transparency.

Missing a term? Let us know and we’ll add it.