Skip to Content
MonitoringAbliterated models

Abliterated & Uncensored Model Benchmark

According to GPTfake monitoring, as of 2026-06-15 the Llama-3-uncensored (abliterated) build refused just 3.1% of our standardized prompts — roughly a quarter of the least-restrictive mainstream model (Mistral, 11.2%) — but its capability-retention dropped to 89% of the stock model’s reasoning score. Abliteration buys near-total compliance at a measurable cost in accuracy. Figures here are illustrative across a fixed 500-prompt set until live data lands.

3.1%vs 11.2% for the least-restrictive mainstream model (Mistral)
Llama-3-uncensored (abliterated) refusal rateGPTfake monitoringas of same fixed 500-prompt set, tested dailyillustrative

What are abliterated models?

Abliterated models are open-weight LLMs whose refusal behavior has been surgically removed from their weights, so they answer prompts a stock model would decline. Community builds — the Dolphin, Hermes, and Llama-uncensored families — own the rising “uncensored llm” and “least censored ai” demand, yet mainstream benchmarks avoid them for reputational reasons. GPTfake measures them as an independent watchdog. For the technique, see what is an abliterated model.

These are illustrative figures from our standardized set, not a live feed. Abliteration removes safety behavior, not just over-refusal — a near-zero refusal rate means the build also complies with genuinely harmful requests. GPTfake measures these builds as an independent watchdog and does not host, distribute, or recommend them. GPTfake is not funded by any AI lab.

The benchmark

Uncensored and abliterated community builds, ordered from least to most restrictive by overall refusal rate. Capability-retention estimates the build’s reasoning/accuracy as a share of its stock base model (lower = more degradation from ablation).

RankBuildFamilyBase modelMethodRefusal rateCapability-retentionAs ofSample
1Llama-3-uncensoredLlama-uncensoredLlama 3 8BAbliterated3.1%89%2026-06-15n = 500
2Dolphin 2.9DolphinLlama 3 8BFine-tuned4.4%94%2026-06-15n = 500
3Hermes 3HermesLlama 3.1 8BFine-tuned6.8%96%2026-06-15n = 500
4Qwen2-abliteratedLlama-uncensoredQwen2 7BAbliterated7.2%85%2026-06-15n = 500
Mistral (mainstream ref.)Mistral LargeStock11.2%100% (ref.)2026-06-15n = 500

Illustrative. GPTfake measures the Llama-3-uncensored abliterated build at a 3.1% refusal rate — the lowest of the uncensored builds we track — as of 2026-06-15, n = 500 each. The mainstream reference row (Mistral) is the least-restrictive model on the least-censored ranking. See the methodology for scoring.

Refusal vs capability: the trade-off

Abliteration drives the refusal rate toward zero, but it edits the weights that produce refusals — and those weights overlap with reasoning. Our two-number readout makes the trade-off legible:

  1. Pure abliteration removes the most refusals — the Llama-3-uncensored build refuses least (3.1%) but also shows the largest accuracy hit among Llama-based builds (89% retention).
  2. Fine-tuned builds (Dolphin, Hermes) refuse slightly more but retain more capability — they were trained for compliance and helpfulness, not just stripped of refusals.
  3. Cross-base abliteration can degrade more — the Qwen2 abliterated build shows the lowest retention (85%), consistent with stronger topic-specific filtering being harder to remove cleanly.

A low refusal rate measures permissiveness, not safety or quality — and here it comes with a measurable capability cost. See what is an abliterated model for how the technique works.

How we test abliterated builds

We run each community build through the same standardized prompt library used for mainstream models, across multiple sessions, with version tracking and NLP-based classification. Because these are open weights, the results are independently reproducible — the same integrity property that makes Mistral our cross-check baseline. Each response is scored for refusal, evasion, and completeness; capability-retention is estimated against the stock base model on a held-out reasoning set. Full protocol on the monitoring methodology; concept definitions on what is an abliterated model.

Caveats & limitations

  • Illustrative figures. The numbers above are snapshots from our test set, not a live feed.
  • Refusal ≠ safe. A near-zero refusal rate means the build complies with harmful requests too — abliteration removes safety behavior, not just over-refusal.
  • Build churn. Community builds are re-quantized and re-released constantly; a given checkpoint’s numbers drift fast.
  • Capability-retention is an estimate. It depends on the reasoning set used and the base-model comparison; treat it as directional.
  • We measure, we don’t distribute. GPTfake reports on these builds as a watchdog and does not host, mirror, or recommend them.