CastReader decodes Kindle Cloud Reader's text by intercepting Amazon's font subset data and mapping scrambled glyph codes back to real characters using local OCR calibration. It's the only Chrome extension that can do this — every other TTS tool reads gibberish because Amazon's custom fonts make the DOM text unreadable. Here's exactly how it works.
The Problem Every TTS Extension Hits
Open any book on read.amazon.com. Right-click. "Inspect Element." Look at the text in the DOM.
You won't find any.
Kindle Cloud Reader doesn't render text the way a normal website does. There are no <p> tags with readable sentences. No <span> elements with words in them. Instead, Amazon's renderer delivers the entire page as a pre-rendered blob image — a single <img> tag pointing to a blob: URL. The page you're reading is, from the browser's perspective, a picture.
But it gets worse. Amazon also sends structured data alongside that image: glyph positions, font metrics, paragraph boundaries. This data uses custom font subsets where Unicode codepoints are remapped to arbitrary glyph IDs. The character "T" might be glyph 847. The letter "h" might be glyph 203. The letter "e" might be glyph 1,044. These numbers change per book. They can even change per batch of pages within the same book.
When Read Aloud, NaturalReader, or Speechify try to extract text from this page, they find either nothing (it's an image) or scrambled glyph codes that produce nonsense when fed to a TTS engine. This isn't a bug in those extensions. They're architecturally incapable of solving this problem.
How Amazon's Font Scrambling Actually Works
Amazon's Kindle renderer operates through a /renderer/render API that returns a TAR archive for each batch of pages. Inside that archive:
tokens_X_Y.json— paragraph boundaries and word bounding boxes, each identified by apositionIdpage_data_X_Y.json— the actual glyph sequences, font references, and 2D transforms for positioning each characterglyphs.json— SVG path definitions for every glyph in the font subset (~93KB of vector data)
The key structure is the "run" — a sequence of glyph IDs that represents a chunk of text. Each run looks something like this:
{
"glyphs": [847, 203, 1044, 92, 847, 203, 1044],
"fontFamily": "amzn-mobi-KindleBookerly",
"elementId": "934",
"xPosition": [59.6, 67.2, 73.8, 80.1, 86.4, 93.0, 99.6]
}Those glyph IDs — 847, 203, 1044 — are not Unicode. They're indices into the custom font subset delivered in glyphs.json. The font file knows how to draw glyph 847 as the letter "T," but that mapping exists only inside the font. There's a strict 1:1 relationship: one positionId corresponds to exactly one glyph.
Amazon refreshes these font subsets across render cycles. Navigate forward 18 pages and a new batch arrives — potentially with a completely different glyph mapping. Glyph 847 might now be "S" instead of "T." This means any decoder that caches mappings from the first batch will produce wrong text on later pages.
The scheme is elegant DRM. The browser renders the page correctly because it has the font file. But anything trying to read the underlying data programmatically gets meaningless numbers.
CastReader's Four-Step Decode Pipeline
CastReader solves this with a pipeline that runs entirely in your browser. No cloud processing. No API costs for decoding. Four coordinated components work across Chrome's execution contexts.
Step 1: Intercept the Render Data
A main-world content script (kindle-intercept.content.ts) runs at document_start — before Amazon's own code loads. It intercepts responses from the /renderer/render API, parses the TAR archive, and extracts the token data, page data, and glyph definitions.
This happens transparently. The interceptor doesn't block or modify Amazon's rendering. It just copies the data as it flows through, accumulating pages across all three batches that Amazon sends per render cycle (current pages, backward prefetch, forward prefetch — roughly 18 pages total).
The extracted data gets passed to an isolated-world content script via DOM attributes. Two Chrome execution contexts, cooperating through the only bridge available to them.
Step 2: Build the Glyph-to-Visual Mapping
The glyph mapper (kindle-glyph-mapper.ts) takes the raw glyph SVG paths and renders each one onto a small canvas. This produces a visual representation of every glyph in the current font subset — what each glyph ID actually looks like when drawn.
But a picture of a letter isn't the same as knowing which letter it is. Glyph 847 renders as something that looks like "T" — but the mapper needs to confirm that programmatically. That's where OCR comes in.
Step 3: OCR Calibration (Not OCR Reading)
This distinction is critical: CastReader does not use OCR to read the book. OCR is used only for calibration — to build a mapping table between glyph IDs and real characters.
Here's how it works. CastReader captures the blob image of the current page and sends it to Tesseract.js, which runs locally in a Chrome offscreen document. Tesseract reads the image and produces recognized text. CastReader then aligns the OCR output against the known glyph sequences from the token data.
The alignment uses position matching. Each glyph has precise x/y coordinates from the page data. Each OCR character has a bounding box. By matching positions, CastReader builds a confidence-scored mapping: glyph 847 at position (59.6, 142.3) corresponds to OCR character "T" at roughly the same coordinates. Do this across hundreds of glyphs on a page and you get a complete decode table.
The space character gets special treatment. Spaces are encoded as glyphs within runs (not as gaps between runs), and the space glyph is identified as the most frequently occurring glyph on the page — a statistical shortcut that's reliable across every book tested.
Why not just use OCR for everything? Three reasons:
- Accuracy. OCR makes mistakes, especially with unusual fonts or small text. Glyph decoding, once calibrated, is exact.
- Word-level highlighting. CastReader highlights individual words as they're spoken. This requires precise character-level text that matches the token positions. OCR text doesn't align cleanly enough.
- Speed. OCR is slow. Glyph decoding after calibration is instant — a simple table lookup per character.
The calibration runs once per render cycle. When the user turns enough pages to trigger a new batch with a different font subset, CastReader automatically re-calibrates. Every time you click "Read Page," fresh OCR runs to ensure the decode table matches the current font.
Step 4: Decode and Extract Paragraphs
With the mapping table built, decoding is straightforward. Walk through each token block (which maps 1:1 to a semantic paragraph), look up each glyph ID in the decode table, concatenate the characters, and output clean text.
The token data also provides exact bounding boxes for every word and paragraph. CastReader uses these to create a DOM overlay with positioned <div> elements for each paragraph — enabling the same click-to-jump and paragraph highlighting that works on regular websites.
For dual-column layouts (common in Kindle), the system detects column structure from the token positions and orders paragraphs correctly: left column top-to-bottom, then right column top-to-bottom. The layout detection is purely data-driven — derived from the x-coordinates of token blocks, not from heuristics about page width.
Why Nobody Else Has Built This
The engineering complexity is substantial. You need:
- A main-world content script that intercepts fetch responses without breaking Amazon's rendering
- TAR archive parsing in the browser
- Cross-context communication between main world and isolated world scripts
- An offscreen document running Tesseract.js for OCR
- Position-based alignment between OCR output and glyph sequences
- Adaptive re-calibration when font subsets change across batches
- Dual-column detection and correct reading order
- Word-level bounding boxes for highlight synchronization
Each of these is a non-trivial problem. Together, they form a system that took months of reverse engineering Amazon's render pipeline to get right. The per-batch font rotation alone — where glyph mappings change every 18 pages — eliminates any approach based on a static lookup table.
And all of this runs locally. No book data leaves your browser for decoding. The only network call is sending the final clean text to the TTS voice API.
How to Use It
Three steps. Seriously.
1. Install CastReader from the Chrome Web Store. Also available on Edge Add-ons. No account required.
2. Open a book at read.amazon.com.
3. Click the CastReader icon. The extension extracts text from the current pages, starts reading with a natural AI voice, and highlights each paragraph as it goes. Click any paragraph to jump to it. Use the floating player to pause, adjust speed, or skip ahead.
The first page takes a few seconds longer than usual — that's the OCR calibration running. Subsequent pages decode instantly from the cached mapping.
For a quick-start guide, see our Listen to Kindle page. And if you're interested in how CastReader handles regular websites (no font tricks needed), check out our overview of free text-to-speech tools.
The Bottom Line
Amazon built font subset scrambling to protect book content from copying. It's effective DRM — it stops every generic text extraction tool cold. But it also blocks accessibility tools, screen readers, and TTS extensions that people rely on to consume content.
CastReader bridges that gap with a local decode pipeline that respects the DRM boundary (it reads what's visually on screen, nothing more) while making the text accessible to speech synthesis. Zero cloud cost for decoding. Zero data exfiltration. Just glyph math and a bit of OCR, running in your browser.
Try CastReader free — it works on Kindle Cloud Reader and 99% of other websites too.