Paste the United States Declaration of Independence into a popular AI detector and watch it report back that the text is around 97 percent machine-generated. The document was signed in 1776. Thomas Jefferson did not have access to a large language model. Yet ZeroGPT and tools like it have repeatedly flagged the Declaration, the preamble, and the 1836 Texas Declaration of Independence as overwhelmingly AI-written.
This is not a funny edge case to laugh off. It is the clearest possible demonstration of what an AI detector actually does, and it is the reason no detector can prove who or what wrote a piece of text. If you run a marketing team, commission freelance work, or worry that a vendor or client might run your content through one of these tools, the gap between what detectors claim and what they can prove is now a business risk worth understanding precisely.
Where AI detectors came from
AI text detectors arrived almost the moment AI text generation became good. ChatGPT launched in November 2022. Within weeks, teachers, editors, and platform owners wanted a button that would tell them whether a block of text was human or machine. The market filled the demand fast. GPTZero was built by a Princeton student in early 2023. Turnitin, the plagiarism checker already installed across thousands of schools, shipped AI writing detection in April 2023. Originality.ai, Copyleaks, Winston AI, and a long tail of others followed.
The most telling entry, and the most telling exit, came from OpenAI itself. On 31 January 2023, OpenAI released its own AI Text Classifier, a tool meant to identify text written by the very models OpenAI built. If anyone could detect ChatGPT output, it should have been the company that made ChatGPT. The classifier was not good. By OpenAI's own published numbers, it correctly identified only 26 percent of AI-written text as likely AI-written, while wrongly labelling human writing as AI 9 percent of the time. On 20 July 2023, less than six months after launch, OpenAI quietly retired the tool and added a note to its announcement page citing a low rate of accuracy. The company that had the strongest possible incentive and the deepest possible knowledge could not make detection work, and said so in public.
That should have ended the conversation. It did not, because the demand never went away.
How a detector actually decides
To see why detection fails, you have to understand what a detector measures. It does not read for meaning, check facts, or recognise a writer's voice. It runs statistics on two properties of the text.
The first is perplexity. Perplexity is a measure of how predictable each word is, given the words around it. Think of the sentence "I sat down at the bar and ordered a glass of red." A predictable next word is "wine." An unpredictable one is "jelly beans." Language models are trained to produce the predictable choice, so AI text tends to have low perplexity. Human writing tends to have higher perplexity, because people reach for the surprising word, the idiom, the odd turn of phrase.
The second is burstiness. Burstiness measures how much sentence length and complexity vary across a passage. People write in bursts: a long, winding sentence, then a short one. Three words. Then another long one. AI output tends to be smoother, with sentences closer to a uniform length and rhythm.
So a detector's logic is simple. Low perplexity plus low burstiness equals a high AI score. That is the entire engine behind most of these tools, and once you see it, the failure mode is obvious.
Plenty of humans write with low perplexity and low burstiness. In fact, they are most likely to do so in exactly the contexts where detectors get pointed at their work: formal academic prose, technical documentation, legal writing, careful corporate copy. A clear, plain, well-structured sentence is the goal of good non-fiction writing. It is also, statistically, indistinguishable from machine output. The Declaration of Independence is formal, measured, and built from balanced clauses. To a perplexity engine, that reads as a machine. The detector is not malfunctioning when it flags Jefferson. It is doing precisely what it was designed to do, on text that happens to have the wrong statistical shape.
Why false positives are not a bug
A false positive is when a detector flags genuinely human writing as AI. Because the detector measures style, not origin, false positives are not an occasional glitch to be patched. They are a permanent property of the method.
The numbers from real-world testing make this concrete. Independent analyses generally find false positive rates somewhere between 5 and 15 percent across mainstream detectors, depending on the tool and the text. That range alone should stop anyone from treating a score as proof. At a 10 percent false positive rate, one in ten innocent pieces of writing is branded as machine-made.
Then comes the finding that should settle the argument. In 2023, a team of Stanford researchers led by James Zou tested seven widely used GPT detectors on essays written by non-native English speakers, and published the results in the journal Patterns. The detectors handled writing by US-born eighth-graders almost perfectly. On TOEFL essays written by non-native English speakers, they failed badly: the average false-positive rate was 61.3 percent, meaning the detectors wrongly classified most of those essays as AI-generated. Just under one in five of the non-native essays, 19.8 percent, was flagged as AI by every detector in the test at once.
The reason is the same perplexity engine. Non-native English writers, on average, use a smaller vocabulary, less idiomatic phrasing, and simpler sentence construction. Those traits push perplexity down. The detector reads lower perplexity as machine-made. So a tool sold as an objectivity check turns out to encode a bias against anyone who learned English as a second language. The Markup documented the same pattern from the other direction: a Johns Hopkins professor noticed Turnitin flagging international students far more often than native speakers. This is not a tuning problem that a software update fixes. It is baked into what the tool measures.
The same logic produces false positives for autistic writers, for people who write in a deliberately plain style, and for anyone whose natural prose happens to be even and clear. Decrypt and other outlets have catalogued detectors flagging the Bible, corporate boilerplate, and historical speeches. The pattern is consistent: formal, plain, structured human writing trips the wire.
The legal and reputational danger of acting on a score
If detectors only embarrassed themselves on old documents, this would be a curiosity. The danger is what happens when an institution treats a score as evidence and acts on it.
Universities saw it first, and the more careful ones backed away fast. Vanderbilt University disabled Turnitin's AI detector in August 2023 and explained the arithmetic plainly. Turnitin advertised a 1 percent false positive rate. Vanderbilt processes roughly 75,000 student submissions a year. One percent of 75,000 is 750. That is up to 750 students a year wrongly accused of cheating, from a tool the vendor itself called 99 percent accurate. Vanderbilt also cited the lack of transparency in how the score is reached and the documented bias against non-native speakers.
Turnitin's own product documentation now carries a warning that its AI writing detection may misidentify both human and AI text, and that the score should not be used as the sole basis for action against a student. When the vendor tells you in writing not to rely on the number, relying on the number is not a defensible position.
The accusations that follow a score are now producing lawsuits. A student in Yale's Executive MBA program sued the university after being accused of improper AI use on a final exam that had been flagged by GPTZero; the complaint argues, among other things, that the process discriminated against him as a non-native English speaker. At Adelphi University, a student named Orion Newby, who is on the autism spectrum, was accused of AI plagiarism, sued, and won. A New York court called the finding against him without valid basis and devoid of reason, and ordered the university to expunge his record. In Palo Alto, the family of a high school student filed a federal civil rights suit in 2026 after he was accused on the strength of a Turnitin run that, the complaint alleges, was conducted without the family's knowledge or consent.
Translate that into a commercial setting. Suppose you accuse a freelance writer of secretly using AI because a detector returned a high score, withhold payment, and terminate the contract. The writer produces their drafts, their notes, and their version history, and the detector score was simply wrong, as detector scores routinely are. You now face a breach of contract claim, a defamation exposure if you told anyone why you fired them, and a public dispute you cannot win, because your only evidence is a tool whose maker says it should not be sole evidence. The same trap waits for any employer disciplining staff on a score, or any client refusing to pay an agency. A detector score is not proof. Acting as though it is converts a statistical guess into a legal liability that sits with you.
The defensible alternative: build an authenticity process
The honest conclusion is that you cannot detect your way to trust. There is no tool, current or coming, that reliably reverse-engineers authorship from a finished block of text, because the finished text simply does not carry that information in a recoverable form. So stop trying to detect. Build a process that makes authenticity visible from the start.
A workable content authenticity process rests on three things, none of which is a detector.
The first is drafts and version history. Genuine writing leaves a trail. A document written by a person over hours or days has a revision history: messy early drafts, restructured sections, comments, deletions, the visible mess of thinking. Tools like Google Docs and Microsoft Word keep this automatically. A clean piece of work that appeared in a single save, fully formed, is far more informative than any perplexity score. Ask for the working document, not just the final file. The trail is hard to fake convincingly and easy to produce when the work is real.
The second is disclosure, set in advance. Decide your policy on AI use and write it into the brief or the contract before work begins. Most teams now land somewhere reasonable: AI is fine for research, outlining, and editing, but the ideas, the reporting, the original examples, and the final judgement must be human, and any substantial AI involvement is disclosed. A clear, agreed standard turns AI from a thing people hide into a thing people declare. You cannot enforce a rule you never wrote down, and you cannot accuse someone of breaking a standard that did not exist when they did the work.
The third is provenance, where it can be captured. For images and increasingly for other media, the Coalition for Content Provenance and Authenticity, known as C2PA, attaches cryptographically signed Content Credentials that record how a file was made and edited. The specification is being adopted as an ISO standard and is built into tools from Adobe, camera makers, and major AI platforms. Provenance verifies a real chain of custody rather than guessing from output. It is not yet a solved problem for plain text, and it breaks at a screenshot, but it is the direction that actually works, because it records origin at the source instead of inferring it after the fact.
Underneath all three is the point that should anchor any policy: focus on the quality and integrity of the work, not the tool that touched it. A piece of content is good or bad, original or derivative, accurate or wrong, on its own merits. Whether a model helped draft a paragraph tells you almost nothing about whether the finished work is worth publishing. Editorial judgement, fact-checking, and a named person who stands behind the byline do the job a detector pretends to do, and unlike the detector, they actually work.
Where this is heading
Detection and generation are locked in a race that detection cannot win. Every improvement in language models makes their output more varied, more human, and harder to flag, while the techniques people use to disguise AI text, the so-called humanizers, are explicitly built to defeat perplexity scoring. A method that depends on machine writing being statistically distinct from human writing gets weaker every quarter, because the whole industry is working to erase that distinction.
Expect the detector vendors to keep selling scores, and expect more of those scores to come wrapped in stronger disclaimers and renamed as indicators or probabilities rather than verdicts. The disclaimers are not marketing caution. They are the accurate description of the product. A probability is not a proof, and a tool that misfires on the Declaration of Independence and on a majority of essays by non-native English speakers has told you, clearly, what it can and cannot do.
The teams that come out of this well will be the ones that stopped asking software an unanswerable question. Authorship is not a property you can read off a page after the fact. It is a chain of custody you either recorded or did not. Build the process that records it, agree the disclosure rules before the work starts, judge the content on its merits, and treat any detector score for what it is: a guess about writing style, dressed up as a fact about origin.
Council summary
This post argues that AI detectors measure writing style, not origin, so a high score is a guess and never proof of authorship. The council verified every figure against primary sources: OpenAI's retired classifier caught 26 percent of AI text and false-flagged human writing 9 percent of the time; the Stanford study in Patterns recorded a 61.3 percent average false-positive rate on non-native English essays, with 19.8 percent flagged by every detector at once; Vanderbilt's 75,000-submission arithmetic and Turnitin's own "not the sole basis" disclaimer both check out, as do the Yale, Adelphi, and Palo Alto cases. The reader takeaway is direct: do not act on a detector score. Build an authenticity process from drafts and version history, disclosure agreed in advance, and provenance where it can be captured, then judge the work on its merits.
Comments