Skip to Content
ResearchBias & fairness metrics

AI fairness & bias metrics

GPTfake quantifies AI model bias with six core fairness metrics — refusal rate, demographic parity, equalized odds, disparate impact ratio, counterfactual fairness, and the GPTfake bias score (-100 to +100). No single metric captures every harm, and some are mathematically incompatible, so we report several side by side and are explicit about what each does and does not measure.

Core metrics

MetricWhat it measuresBest for
Refusal rateShare of prompts the model declines to answerCensorship / over-restriction
Demographic parityWhether outcome rates are equal across groupsGroup-level disparity
Equalized oddsEqual true/false-positive rates across groupsClassification fairness
Disparate impact ratioRatio of favorable outcomes between groups (≥ 0.8 rule of thumb)Quick adverse-impact screen
Counterfactual fairnessOutput unchanged when only the protected attribute changesCausal, per-prompt fairness
Bias score (GPTfake)Net ideological lean, -100 to +100Political / viewpoint framing

These are the metrics behind every bias score on our monitoring and research pages. See them applied to live models — e.g. ChatGPT bias data.

How we measure bias

Every metric above is computed from the same standardized prompt runs described in our monitoring methodology. We publish the prompt library and raw outputs as open datasets so any number here can be independently re-derived.

  • Standardized prompts — fixed prompt sets per topic category, re-run on every model release.
  • Paired comparison — the same prompt is varied along a single protected attribute (e.g. swapping a demographic term) so differences are attributable.
  • Aggregation — per-prompt scores roll up to per-topic and per-model figures, always reported with sample size and date.

Trade-offs & limitations

These metrics are not interchangeable, and a model can look fair on one while failing another:

  • Impossibility results — demographic parity, equalized odds, and calibration cannot all hold at once except in degenerate cases. Pick the ones that match the harm you care about.
  • Refusal ≠ safety — a high refusal rate reduces some harms while introducing over-censorship. We report it as a signal, not a verdict.
  • Construct validity — a single “bias score” compresses many dimensions; always read it alongside the per-topic breakdown, not on its own.

See AI bias detection for the conceptual background, or the live numbers on each model page.