Is high perplexity always good?

No — gibberish has maximal perplexity. Good human writing is mostly fluent with genuine surprises where they earn their place. The goal is natural variation, not randomness.

Can I fake burstiness by randomly chopping sentences?

Crude chopping reads worse to humans and barely fools classifiers that also read word-level patterns. Genuine restructuring changes both metrics coherently.

Do modern detectors still rely on these two metrics?

They're inputs and intuition rather than the whole engine now — trained classifiers dominate. But text that's genuinely varied and surprising tends to pass those classifiers too, because they learned from the same underlying contrast.

Perplexity and burstiness, explained simply

Updated June 10, 2026

Every AI-detection conversation eventually hits these two words. They're not jargon for "magic" — they're two specific, simple measurements, and once you see them you'll see your own writing differently.

Perplexity: how surprising is your next word?

Language models predict next words. Perplexity measures how surprised a model is by the word that actually comes next. "The cat sat on the ___" → "mat" barely surprises it; "ledger" does. Average that surprise over a passage and you have its perplexity.

Why it detects AI: generated text was built by choosing likely words, so a model re-reading it is rarely surprised — low perplexity is the residue of generation. Human writing reaches for odd verbs and takes tangents; the surprise score runs higher.

Burstiness: does your rhythm vary?

Burstiness measures variation — mostly in sentence length and structure — across a passage. Humans write in bursts: a long accumulating sentence, then a short one. For emphasis. Models converge on a steady medium: fifteen to twenty-five words, subject-verb-object, again and again.

Plot sentence lengths and human writing looks like a mountain range; model output looks like a picket fence. That flatness is measurable in one pass.

How detectors use them — and what supplanted them

Early GPTZero was nearly this simple: low perplexity + low burstiness → likely AI. Modern detectors use trained classifiers that learn hundreds of subtler features, but these two remain the interpretable core — and the reason formal human prose false-flags: academic and business English is deliberately low-surprise and low-variation. The metrics can't tell disciplined from generated.

Using the concepts on your own writing

This is why synonym swaps fail (they don't change either metric) and why structural rewriting works (it changes both). It's also a practical self-edit: scan your sentence lengths for picket-fence patterns, and ask where a reader could finish your sentence for you. A humanize pass automates exactly this variation — and the detector shows you the before/after instead of leaving it to faith.

Perplexity and burstiness, explained simply

Perplexity: how surprising is your next word?

Burstiness: does your rhythm vary?

How detectors use them — and what supplanted them

Using the concepts on your own writing

Frequently asked questions

Keep reading

Perplexity

Burstiness

How GPTZero works

How to humanize AI text: the complete guide

Humanize it — then verify it