What is an abliterated model?

An abliterated model is an open-weight LLM whose ability to refuse has been surgically removed by identifying and erasing the internal “refusal direction” in its weights — so it answers prompts a stock model would decline, without retraining. The result is a near-uncensored build (e.g. Llama-uncensored, Dolphin, Hermes) that trades safety guardrails for permissiveness. See how abliterated models score.

Definition

In the GPTfake sense, abliteration is a weight-editing technique that ablates the activation direction a model uses to refuse, collapsing its refusal behavior toward zero while leaving the rest of the network largely intact. Because it edits weights directly rather than retraining on new data, abliteration is cheap, reproducible, and applied to thousands of community builds on HuggingFace.

“Abliterated” is a portmanteau of ablate + obliterate. It is distinct from a fine-tune: a fine-tune teaches new behavior from examples, while abliteration removes an existing behavior (refusal) by editing the model’s internal representation of it. For the broader concept, see what is AI censorship.

Abliterated vs uncensored vs fine-tuned

These terms are used loosely and often confused. They are not the same:

Term	What it means	How it’s produced
Abliterated	Refusal direction removed from the weights	Surgical weight editing (no retraining)
Uncensored	Umbrella term for any build that refuses little	Abliteration or fine-tuning or prompting
Fine-tuned (uncensored)	Retrained to comply on broad prompts (e.g. Dolphin)	Supervised fine-tuning on permissive data
Stock / aligned	The provider’s safety-trained release	RLHF / Constitutional AI by the lab

Abliteration is one method of producing an uncensored model; “uncensored” is the broader category. A community build may combine both — fine-tuned for capability, then abliterated for compliance.

How it’s measured

GPTfake runs abliterated and uncensored community builds through the same standardized prompt set used for mainstream models, scoring each response as answered, partial, redirected, or refused. We report two numbers side by side: a refusal rate (how little it declines) and a capability-retention estimate (whether ablation degraded reasoning or factual accuracy). The full protocol — prompt categories, scoring, reproducibility — is on the methodology page.

Abliterated models remove safety behavior, not just over-refusal. A near-zero refusal rate means the build will also comply with genuinely harmful requests. GPTfake measures these builds as an independent watchdog; we do not host, distribute, or recommend them.

See it in the data

Abliterated model benchmark

Refusal rates and capability-retention notes for Dolphin, Hermes, and Llama-uncensored builds.

Least censored AI models

Where uncensored local builds sit against mainstream models, ranked by data.

Open datasets

Download the refusal measurements behind these figures (CSV/JSON).

What is AI censorship — the parent concept (refusal, deflection, filtering)
Least censored AI models — the ranked, evidence-based list
AI censorship leaderboard — refusal rate and bias by model
Glossary — refusal rate, refusal direction, and more

What is an abliterated model?

Definition

Abliterated vs uncensored vs fine-tuned

How it’s measured

See it in the data

Monitoring

Research

Resources

Company

What is an abliterated model?

Definition

Abliterated vs uncensored vs fine-tuned

How it’s measured

See it in the data

Related reading

Monitoring

Research

Resources

Company