AI fairness & bias metrics

GPTfake quantifies AI model bias with six core fairness metrics — refusal rate, demographic parity, equalized odds, disparate impact ratio, counterfactual fairness, and the GPTfake bias score (-100 to +100). No single metric captures every harm, and some are mathematically incompatible, so we report several side by side and are explicit about what each does and does not measure.

Core metrics

Metric	What it measures	Best for
Refusal rate	Share of prompts the model declines to answer	Censorship / over-restriction
Demographic parity	Whether outcome rates are equal across groups	Group-level disparity
Equalized odds	Equal true/false-positive rates across groups	Classification fairness
Disparate impact ratio	Ratio of favorable outcomes between groups (≥ 0.8 rule of thumb)	Quick adverse-impact screen
Counterfactual fairness	Output unchanged when only the protected attribute changes	Causal, per-prompt fairness
Bias score (GPTfake)	Net ideological lean, -100 to +100	Political / viewpoint framing

These are the metrics behind every bias score on our monitoring and research pages. See them applied to live models — e.g. ChatGPT bias data.

How we measure bias

Every metric above is computed from the same standardized prompt runs described in our monitoring methodology. We publish the prompt library and raw outputs as open datasets so any number here can be independently re-derived.

Standardized prompts — fixed prompt sets per topic category, re-run on every model release.
Paired comparison — the same prompt is varied along a single protected attribute (e.g. swapping a demographic term) so differences are attributable.
Aggregation — per-prompt scores roll up to per-topic and per-model figures, always reported with sample size and date.

Trade-offs & limitations

These metrics are not interchangeable, and a model can look fair on one while failing another:

Impossibility results — demographic parity, equalized odds, and calibration cannot all hold at once except in degenerate cases. Pick the ones that match the harm you care about.
Refusal ≠ safety — a high refusal rate reduces some harms while introducing over-censorship. We report it as a signal, not a verdict.
Construct validity — a single “bias score” compresses many dimensions; always read it alongside the per-topic breakdown, not on its own.

See AI bias detection for the conceptual background, or the live numbers on each model page.

AI fairness & bias metrics

Core metrics

How we measure bias

Trade-offs & limitations

Monitoring

Research

Resources

Company