The AI Writing Glossary
Writers shouldn't need a computer science degree to understand the tools being used to judge their work. Every term here is explained for writers, not engineers.
A
AI Detection
Software that attempts to determine whether a piece of text was written by a human or generated by an artificial intelligence. These tools use statistical analysis of language patterns to make their predictions. No current tool achieves 100% accuracy.
Why writers should care: If you write for a living, AI detection tools are now the gatekeepers deciding whether your work is "real."
Algorithmic Bias
Systematic errors in AI systems that produce unfair outcomes for certain groups. In AI detection, this manifests as higher false positive rates for non-native English speakers, writers from certain cultural backgrounds, and people with specific writing styles.
Why writers should care: Your writing style might make you statistically more likely to be falsely flagged, and the tool won't tell you that.
B
Burstiness
A measure of how much variation exists in the complexity and length of sentences within a piece of writing. Human writing tends to be "bursty" - mixing short, punchy sentences with longer, more complex ones. AI-generated text tends to be more uniform.
Why writers should care: Some detectors use burstiness as a signal. Writers with naturally consistent sentence structures may trigger false positives.
C
C2PA
The Coalition for Content Provenance and Authenticity - an open standard for certifying the origin and history of digital content. C2PA attaches cryptographic metadata to files, creating a verifiable chain of custody from creation to publication.
Why writers should care: C2PA could eventually let you prove when, where, and how you wrote something - a digital fingerprint for your writing process.
Content Provenance
The documented history of a piece of content from its creation through every edit, save, and publication. In writing, provenance includes drafts, revision history, research notes, and metadata showing how the work evolved over time.
Why writers should care: Building a provenance trail is your best defense against a false AI detection accusation. Keep your drafts.
D
Deepfake Text
Text generated by AI that is designed to be indistinguishable from human writing. Unlike general AI text generation, deepfake text is specifically crafted to mimic a particular writer's style, voice, or persona.
Why writers should care: As AI gets better at mimicking individual writers, the concept of "voice" as proof of humanity becomes more complicated.
F
False Positive
When an AI detection tool incorrectly identifies human-written text as AI-generated. False positives are the central crisis in AI detection - they can result in academic penalties, lost jobs, damaged reputations, and legal consequences for innocent writers.
Why writers should care: This is the term that matters most. A false positive means you're being punished for something you didn't do.
Fine-Tuning
The process of training an existing AI model on a specific dataset to specialize its outputs. A fine-tuned model might write in a particular style, about a specific topic, or for a defined audience. The base model's general capabilities are preserved while being sharpened for the new purpose.
Why writers should care: Fine-tuned models can produce text that's harder to detect, because it's been trained to sound more "human" in specific domains.
H
Hallucination
When an AI model generates information that is factually incorrect, fabricated, or nonsensical while presenting it with the same confidence as accurate information. AI models do not "know" facts - they predict likely word sequences, which can produce convincing-sounding fiction.
Why writers should care: Ironically, hallucinations are one of the clearest signs that AI wrote something. If your accused text contains verified facts, that's evidence in your favor.
L
Large Language Model (LLM)
A neural network trained on massive amounts of text data that can generate, analyze, and transform human language. LLMs like GPT-4, Claude, and Gemini learn statistical patterns in language and use those patterns to predict what words should come next in a sequence.
Why writers should care: Understanding what an LLM actually does - pattern matching, not thinking - helps you understand why detection is so difficult and why your writing is fundamentally different from its output.
P
Perplexity
A measurement of how "surprised" a language model is by a piece of text. Low perplexity means the text is predictable - each word follows naturally from the last. High perplexity means the text contains unexpected word choices, unusual phrasing, or creative language.
Why writers should care: Many detectors flag low-perplexity text as AI-generated. But some human writers naturally write in clear, predictable prose - and get punished for it.
Prompt Engineering
The practice of crafting specific instructions (prompts) to guide an AI model's output. Skilled prompt engineers can produce highly targeted, natural-sounding text by carefully structuring their requests, including examples, constraints, and style guidance.
Why writers should care: Good prompt engineering makes AI text harder to detect, which raises the stakes for everyone whose writing is scrutinized.
T
Token
The basic unit of text that AI models process. A token is typically a word or part of a word - "writing" is one token, "unbelievable" might be split into "un," "believ," and "able." AI models process, generate, and price their services in tokens rather than words.
Why writers should care: When someone says a model "can handle 100,000 tokens," they're describing the amount of text it can process at once - roughly 75,000 words.
W
Watermarking
A technique where AI-generated text is subtly marked during generation by biasing the model's word choices toward specific patterns invisible to human readers but detectable by specialized tools. Watermarking embeds a statistical signature in the text itself.
Why writers should care: Watermarking could eventually solve the detection problem - but only if all AI providers agree to use it, and only if the marks survive editing.
Z
Zero-Shot Detection
An AI detection approach that works without being trained on specific examples of AI-generated text. Instead of learning patterns from a labeled dataset, zero-shot detectors use the statistical properties of language models themselves to estimate the probability that text was AI-generated.
Why writers should care: Zero-shot detectors are the most common type - and the most prone to error, because they're making educated guesses rather than matching known patterns.
Missing a term? Let us know →