# Gabriel Kanev
> Source: https://gkanev.com/posts/the-ai-didnt-read-your-document-it-pretended-to/
> Machine-readable version - 2026-04-14

---
- Search ESC When you ask an AI to analyze a document, you probably assume it reads it. It doesn’t - not the way you do.
A researcher recently tested this by feeding LLMs the complete Harry Potter books and embedding two entirely fabricated spells: “Fumbus” and “Driplo.” The instruction was simple: find any spells that don’t exist in the real books. None of the models found them. They were too busy recalling what they already knew about Harry Potter from training data to actually process what was in front of them.
This isn’t a bug. It’s a structural feature of how these models work.
## The Memorization Problem

A January 2026 Stanford study found that Claude reproduced 95.8% of Harry Potter and the Sorcerer’s Stone verbatim. Gemini produced 9,070 consecutive verbatim words. These models have seen so much training data that they can reconstruct large chunks of popular texts from memory - which means when you give them a document they’ve encountered before, they may be answering from that memory rather than reading what you’ve provided.
## Lost in the Middle

Even with documents the model hasn’t memorized, there’s a structural problem called “lost in the middle.” Transformer architecture causes models to pay strong attention to document beginnings and endings while systematically neglecting the middle sections. This isn’t fixable with prompting - it’s architectural.
Studies show that information in the middle of long documents is processed significantly less reliably than information at the edges.
## What This Means in Practice

Legal review: A model asked to flag problematic clauses may miss ones buried in the middle of a long contract

- Risk analysis: Key risks in the body of a document can be overlooked

- Code audits: Vulnerabilities in the middle of a large codebase may be glossed over

- Research analysis: Models may blend your document’s content with memorized knowledge on the same topic

## What You Can Actually Do

Use specific queries. Instead of “summarize this document,” ask “What does section 4.2 say about termination rights?” Specific anchors force the model to locate and retrieve particular content.

Place critical content at edges. If you’re building a system that uses AI to process documents, put the most important information at the beginning or end.

Treat outputs as first passes. AI document analysis is a starting point, not a final answer. Build in human review for anything consequential.

Understand RAG’s limits. Retrieval-Augmented Generation helps but doesn’t eliminate these problems - it just means the model is working with retrieved chunks rather than the full document, which introduces its own distortions.

The uncomfortable truth is that AI “reading” is a metaphor that misleads. These systems are incredibly powerful pattern-matchers, but pattern-matching and close reading are different activities. When accuracy matters, design your processes accordingly.

 Need hands-on help?

 [Consulting →](/consulting/) Share [X / Twitter](https://twitter.com/intent/tweet?url=https%3A%2F%2Fgkanev.com%2Fposts%2Fthe-ai-didnt-read-your-document-it-pretended-to%2F&#38;text=The%20AI%20Didn't%20Read%20Your%20Document.%20It%20Pretended%20To.) [LinkedIn](https://www.linkedin.com/shareArticle?mini=true&#38;url=https%3A%2F%2Fgkanev.com%2Fposts%2Fthe-ai-didnt-read-your-document-it-pretended-to%2F&#38;title=The%20AI%20Didn't%20Read%20Your%20Document.%20It%20Pretended%20To.)

## Navigation

- [About](/about-me/)
- [Uses](/uses/)
- [Now](/now/)
- [Resources and Guides](/resources-and-guides/)
- [Speaking](/speaking/)
- [Projects](/projects/)
- [Posts](/posts/)
- [Books](/books/)
- [Research Publications](/research-publications/)
- [Contact me](/contact-me/)
- [Home](/)
---
Generated by astro-inference | https://gkanev.com/llms.txt