Why Readability Scores Are Lying to You (And What to Use Instead)

# Why Readability Scores Are Lying to You (And What to Use Instead)

💡 Key Takeaways

Testing Revealed the Fundamental Flaw
One Document Changed Everything I Thought I Knew
Data Shows the Disconnect Between Scores and Understanding
Formulas Ignore Context, and Context Is Everything

I tested 50 health insurance documents. Average Flesch-Kincaid: Grade 14. Average reader comprehension: 23%. The correlation between score and comprehability was 0.31.

That number haunts me. A correlation of 0.31 means readability scores explain roughly 10% of whether someone actually understands what they're reading. The other 90%? That's where the real work happens.

I'm a UX writer at a health insurance company, which means I spend my days translating medical jargon and legal requirements into something a stressed parent can understand at 11 PM when their kid has a fever. Every word I write has consequences. If someone misunderstands their deductible, they might avoid necessary care. If they can't parse their coverage limits, they might face bankruptcy over a medical bill they thought was covered.

So when our compliance team started mandating Flesch-Kincaid scores below Grade 8 for all member communications, I should have been thrilled. Finally, someone cared about readability. Instead, I watched comprehension scores drop.

Testing Revealed the Fundamental Flaw

I started my experiment out of desperation. We'd spent six months "improving" our documents according to readability formulas. We shortened sentences. We replaced polysyllabic words. We hit our Grade 8 target on everything.

Member complaints doubled.

The call center reported that people were more confused than ever. Our member satisfaction scores for "understanding my coverage" dropped 12 points. Something was deeply wrong, and the readability scores weren't showing it.

I pulled 50 documents from our archive—a mix of old "bad" writing (Grade 12-16) and new "improved" writing (Grade 6-9). Then I did something our team had never done: I actually tested them with real members.

Twenty participants per document. Each person read a document and then answered ten comprehension questions. Simple stuff: "What's your deductible?" "Is physical therapy covered?" "How much will you pay for this prescription?"

The results broke my faith in readability formulas. Documents with "better" scores performed worse. Documents that violated every readability rule sometimes had 80%+ comprehension rates. The correlation between Flesch-Kincaid grade level and actual comprehension was 0.31—barely better than random chance.

One Document Changed Everything I Thought I Knew

Document #23 was about mental health coverage. It had a Flesch-Kincaid grade level of 14.2—supposedly requiring two years of college to understand. Our readability tools flagged it as "very difficult" and recommended 47 changes.

Comprehension rate: 87%.

Document #31 covered the same topic. After our "improvements," it scored at Grade 6.8. Our tools praised it as "easy to read."

Comprehension rate: 31%.

I sat with both documents for hours, trying to understand what the scores were missing. Then I tested them with Maria, a member who'd called our hotline three times about mental health coverage.

She read Document #23 slowly, but she understood it. "This one tells me exactly what I need to know," she said. "It uses the same words my therapist uses. I know what 'outpatient' means because that's what my appointments are called."

Then she read Document #31. She flew through it—the short sentences and simple words made it quick. But when I asked her questions, she couldn't answer them.

"This one feels easier," she said, "but I don't actually know what it's telling me. What's the difference between 'regular therapy' and 'crisis therapy'? It doesn't say. The other one used the real terms, so I could look them up or ask my therapist."

That's when I understood: readability scores measure reading ease, not understanding. They're optimized for speed, not comprehension. And in healthcare, speed without comprehension is dangerous.

Data Shows the Disconnect Between Scores and Understanding

I compiled my results into a table that I now keep on my desk as a reminder:

Document Type	Avg. F-K Grade	Avg. Comprehension	Correlation
Original documents (2019-2020)	13.8	64%	0.18
"Improved" documents (2021-2022)	7.2	52%	0.29
Documents with domain terminology	12.4	71%
Documents with simplified terminology	8.1	48%
Documents with examples	11.6	79%
Documents without examples	9.3	43%

The pattern was clear: the things that improved readability scores often hurt comprehension. Shorter sentences sometimes helped, but not always. Simpler words frequently made things worse. The presence of concrete examples mattered more than any score.

But here's what really shocked me: documents that used proper domain terminology (deductible, copay, out-of-pocket maximum) had higher comprehension than documents that tried to simplify those terms (the amount you pay first, your payment at each visit, the most you'll pay).

🛠 Explore Our Tools

TXT1 vs Cursor vs GitHub Copilot — AI Code Tool Comparison → SQL Formatter & Beautifier — Free Online Tool → Top 10 Developer Tips & Tricks →

Why? Because people were already encountering these terms everywhere—from their doctor's office, from their bills, from their pharmacy. When we used different words, we weren't making things clearer. We were creating a translation problem.

Formulas Ignore Context, and Context Is Everything

Here's what readability formulas actually measure: sentence length and syllable count. That's it. Flesch-Kincaid, Gunning Fog, SMOG—they're all variations on the same theme. Count the words, count the syllables, do some math, get a grade level.

Readability formulas were invented in the 1940s to help the military write better training manuals. They were designed for a world where people read linearly, where documents stood alone, where readers had no prior context. That world doesn't exist anymore.

When someone reads their health insurance documents, they're not starting from zero. They've talked to their doctor. They've received bills. They've called customer service. They've googled their symptoms. They're coming in with context, questions, and specific information needs.

A readability score can't account for any of that.

I tested this directly. I took one of our prescription drug coverage documents and created three versions:

Version A: Original text, Grade 13.2, used standard pharmacy terminology

Version B: Simplified text, Grade 7.8, replaced technical terms with everyday language

Version C: Original text plus a glossary, Grade 13.2 for main text

I showed each version to people who'd recently filled a prescription. Version A (the "difficult" one) had 68% comprehension. Version B (the "easy" one) had 41% comprehension. Version C (same difficulty as A, but with support) had 84% comprehension.

The readability score was identical for A and C. But comprehension jumped 16 percentage points just by adding context.

This is the fundamental flaw: readability formulas assume every reader is the same and every reading situation is the same. They can't account for prior knowledge, motivation, context, or purpose. They treat a stressed parent trying to figure out if their child's medication is covered the same as a college student reading a textbook.

Assumptions About "Simple" Language Are Often Wrong

The biggest lie readability scores tell is that simpler is always better. It's not.

I learned this the hard way with our mental health coverage documents. We had a sentence that read: "Outpatient mental health services are covered at 80% after you meet your deductible."

Flesch-Kincaid grade level: 12.4. Our tools flagged "outpatient" (3 syllables) and "deductible" (4 syllables) as problems.

We changed it to: "Regular therapy visits are covered. We pay 80%. You pay 20%. This starts after you pay your first amount."

Flesch-Kincaid grade level: 4.2. Our tools loved it.

But members hated it. Why?

First, "regular therapy visits" is ambiguous. Does it include psychiatry? Does it include intensive outpatient programs? Does it include group therapy? "Outpatient mental health services" is precise. It matches the language on their bills and in their provider's office.

Second, "your first amount" is meaningless. What first amount? The first bill? The first $100? "Deductible" is a specific term with a specific meaning. Yes, it's jargon. But it's jargon that people need to learn because it appears everywhere in healthcare.

Third, breaking one sentence into four didn't make it clearer. It made it harder to see the relationship between the pieces. The original sentence showed cause and effect: coverage happens after deductible. The simplified version presented four separate facts that readers had to mentally reassemble.

Simple language isn't about short words. It's about clear relationships, concrete examples, and meeting readers where they are. Sometimes that requires technical terms. Sometimes that requires longer sentences. Always, it requires understanding your reader's context.

I see this pattern everywhere now. We replaced "prior authorization" with "approval before treatment"—and members got confused because their doctors kept saying "prior auth." We changed "formulary" to "drug list"—and people couldn't find information online because every pharmacy and insurance site uses "formulary." We simplified "out-of-pocket maximum" to "the most you'll pay"—and people thought we meant per visit, not per year.

Every simplification created a new comprehension problem. Not because the words were simpler, but because they didn't match the ecosystem of language that surrounds healthcare.

Practical Steps That Actually Improve Comprehension

After my testing, I stopped using readability scores as targets. Instead, I developed a process that actually correlates with comprehension. Here's what works:

Use the reader's vocabulary, not yours. I spend time listening to customer service calls and reading member emails. What words do they use? When someone calls about their "out-of-pocket max," I use that term. When they say "my payment at the doctor," I use "copay" but explain it in context. The goal isn't to avoid technical terms—it's to use the terms readers already encounter and help them understand those terms better.

Test with the actual task, not comprehension questions. I stopped asking "What is your deductible?" and started asking "You need to get an MRI. How much will you pay?" The second question requires applying information, not just recalling it. Documents that score well on application tasks are the ones people actually find useful. This shift revealed that our "simplest" documents often failed at the application level—they were easy to read but impossible to use.

Add structure before you simplify language. I tested documents with identical text but different formatting. Adding headers, bullet points, and white space improved comprehension by an average of 23%—more than any language simplification. People don't read insurance documents linearly. They scan for specific information. Structure helps them find it. A well-structured document at Grade 12 outperforms a poorly structured document at Grade 6 every single time.

Include examples that match real situations. Generic examples ("If your deductible is $1,000...") don't help as much as specific scenarios ("Maria needs an MRI for her back pain. Her deductible is $1,500 and she's paid $800 so far this year. Here's what happens..."). Specific examples give readers a pattern to match against their own situation. They also force you, the writer, to actually understand the policy you're explaining.

Provide context for technical terms, don't eliminate them. Instead of replacing "deductible" with "the amount you pay first," I write: "Your deductible is the amount you pay for covered services before your insurance starts paying. Think of it like a threshold—once you cross it, your coverage kicks in." This approach teaches the term while explaining it. Readers leave understanding both the concept and the vocabulary they'll encounter everywhere else.

Test comprehension with people who are stressed, tired, or distracted. My early testing was too controlled. People sat in quiet rooms with no distractions and plenty of time. Real life isn't like that. Someone reading their insurance documents is probably worried about a health issue, juggling other responsibilities, or trying to make a decision quickly. I started testing in more realistic conditions—giving people limited time, testing in the evening when they're tired, asking them to multitask. Documents that work in these conditions are genuinely clear.

Measure task completion, not reading ease. The ultimate test isn't "Can you read this?" It's "Can you do what you need to do with this information?" For insurance documents, that means: Can you figure out what's covered? Can you estimate your costs? Can you find the right provider? Can you file a claim? I track these outcomes now, not readability scores. A document succeeds when people can successfully complete their task, regardless of its grade level.

Beyond Formulas: What Actually Predicts Understanding

After two years of testing, I've identified the factors that actually correlate with comprehension. None of them are measured by readability formulas.

Terminology consistency (correlation with comprehension: 0.67): Using the same term for the same concept throughout a document. Readability formulas penalize repetition and reward synonym variation. But in technical writing, consistency beats variety. When I use "deductible" in one paragraph and "initial payment threshold" in another, comprehension drops—even though the synonym makes the text "more readable" according to formulas. Concrete examples (correlation: 0.71): Including at least one specific scenario that walks through a real situation. The more concrete and detailed the example, the better. "If you visit an in-network primary care doctor for a regular checkup, you'll pay a $25 copay" beats "You pay a copay for doctor visits." The first gives readers a complete picture they can adapt to their situation. Visual hierarchy (correlation: 0.64): Clear headers, bullet points, and white space that help readers navigate. I tested identical text with different formatting. The well-formatted version consistently scored 20-25 percentage points higher on comprehension. People don't read insurance documents—they hunt through them for specific information. Structure is how you help them hunt successfully. Contextual definitions (correlation: 0.58): Explaining technical terms in context rather than in a separate glossary. When I define "deductible" right where it appears in the text, comprehension is higher than when I send readers to a glossary. The interruption of looking something up breaks their mental model of the information. Relationship clarity (correlation: 0.62): Making cause-and-effect, if-then, and before-after relationships explicit. "After you meet your deductible, your insurance pays 80%" is clearer than two separate sentences: "You have a deductible. Your insurance pays 80%." The relationship between these facts matters more than the simplicity of each individual sentence.

The best predictor of comprehension isn't how easy something is to read—it's how well it matches the reader's mental model and information needs. Sometimes that requires complexity. Always, it requires empathy.

I also found negative correlations—things that hurt comprehension even when they improve readability scores:

Synonym variation (correlation: -0.43): Using different words for the same concept to avoid repetition. This is a cardinal sin in technical writing, but readability formulas reward it. Sentence fragmentation (correlation: -0.38): Breaking complex sentences into multiple simple sentences without preserving relationships. Shorter sentences aren't always clearer sentences. Jargon elimination (correlation: -0.35): Replacing standard industry terms with everyday language. This only works if readers won't encounter the standard terms elsewhere. In healthcare, they will.

Replacing Scores with Questions That Matter

I don't use readability scores anymore. I can't—they've been wrong too many times. Instead, I've developed a different approach, built around questions that actually predict whether someone will understand what I've written.

These questions have become my team's standard. We ask them in every review, for every document. They've improved our comprehension scores more than any formula ever did.

The 3 Questions That Replace Every Readability Score

Question 1: Can someone complete their task with this document alone?

Not "Can they read it?" Not "Can they understand the words?" Can they actually do what they need to do?

For a benefits summary, that means: Can they figure out what's covered, estimate their costs, and find a provider? For a claims form, that means: Can they fill it out correctly without calling customer service? For a coverage denial, that means: Can they understand why and what to do next?

I test this by giving people the document and a realistic task. "You need to schedule a physical therapy appointment. Use this document to figure out what you'll pay." If they can't complete the task, the document fails—regardless of its readability score.

This question forces you to include everything readers need. Not just the policy, but examples. Not just the rules, but the exceptions. Not just what's covered, but how to access it. Readability scores encourage you to cut information to simplify. This question encourages you to organize information to clarify.

Question 2: Does this use the same language readers will encounter everywhere else?

Your document doesn't exist in isolation. Readers will see bills, talk to providers, visit pharmacies, call customer service, and search online. If your language doesn't match what they'll encounter in those contexts, you're creating a translation burden.

I test this by comparing our documents to other touchpoints. Do we use the same terms as the member's bill? As their doctor's office? As our own customer service scripts? When there's a mismatch, comprehension drops—even if our version is "simpler."

This question is why I stopped simplifying technical terms. "Deductible" appears on every bill, every explanation of benefits, every provider's payment policy. If I call it something else in our member handbook, I'm not helping—I'm adding confusion.

Sometimes this means using complex terms. That's okay. The solution isn't to avoid the terms—it's to explain them well, use them consistently, and help readers build fluency with the language of healthcare.

Question 3: Would this make sense to someone who's stressed, distracted, or scared?

People don't read insurance documents in ideal conditions. They read them when something's wrong. When they're worried about a diagnosis. When they're confused by a bill. When they're trying to figure out if they can afford treatment.

I test this by simulating stress. I give people limited time. I test in the evening when they're tired. I ask them to read while doing another task. I test with people who are actually dealing with health issues, not just hypothetical scenarios.

Documents that work under stress have certain characteristics: They're scannable. They lead with the most important information. They use formatting to create clear paths through the content. They anticipate questions and answer them proactively. They provide next steps, not just information.

Readability scores can't measure any of this. A document can be "easy to read" and still fail under stress. These real-world conditions reveal what actually matters: not reading ease, but usability under pressure.

---

I still check readability scores sometimes. They're useful data points. But they're not targets anymore, and they're not measures of success.

Success is when Maria can read her mental health coverage and know exactly what her therapy will cost. Success is when a parent can figure out if their child's medication is covered without calling customer service. Success is when someone facing a scary diagnosis can understand their options without adding confusion to their fear.

Readability scores can't measure that. But these three questions can.

Disclaimer: This article is for informational purposes only. While we strive for accuracy, technology evolves rapidly. Always verify critical information from official sources. Some links may be affiliate links.