Read Aloud: How to Listen to Any Webpage in 2026
I did the math last Tuesday. Sitting at my desk at 11pm, eyes burning, halfway through a 6,000-word investigation into lithium mining published by Reuters. I'd already read two research papers that morning, a product spec over lunch, three Substack newsletters during a break that wasn't really a break, and a long Wikipedia rabbit hole about the Bretton Woods system that started because someone mentioned it in a Slack thread. Conservatively twelve thousand words before dinner. Probably closer to fifteen. All of it staring at a screen.
And I thought — why am I doing this with my eyes?
That was maybe a year ago. I think it was... early 2025? Somewhere around there. Since then I've tried every method I can find for reading web pages out loud. Browser features, Chrome extensions, phone accessibility tricks, paid apps, free apps, open-source projects that require you to bring your own API key. Some of them are brilliant. Some are barely functional. One of them costs more per year than my Spotify and Netflix combined.
Here's everything I learned.
The first thing most people don't realize is that your browser might already do this. Edge has it built in. Right-click any page in Microsoft Edge, hit "Read aloud," and it starts talking. No extension, no download, no account. The voices are Microsoft's Azure neural voices, which means they sound surprisingly human — not perfect, there's still that slight synthetic smoothness that tells your brain something is off, but close enough that you stop noticing after five minutes. Speed control goes up to 2x, and you can pick from maybe two dozen voices across different languages. I used Edge's read aloud exclusively for about three weeks and it handled most articles without complaint. Long-form journalism, documentation pages, blog posts. Fine.
But it has a problem.
Edge reads the page. The whole page. Navigation menus, cookie banners, footer links, sidebar widgets, "Trending Now" sections, author bios, related article carousels. I was listening to a Washington Post piece about semiconductor policy and suddenly the voice said "Most Read. Opinion. Democracy Dies in Darkness. Sign in." Right in the middle of a paragraph about TSMC's Arizona fab. The immersion — gone. Like watching a movie and having someone read the DVD menu options aloud during the climax. And Chrome doesn't even have this feature natively. Firefox doesn't either, though Firefox has a Reader View that strips away page chrome, and if you activate Reader View first and then use a TTS extension, the extraction problem mostly goes away. Mostly.
So extensions. This is where it gets interesting and also where the choices multiply to a slightly overwhelming degree. (I wrote a full comparison of the best TTS Chrome extensions if you want the deep dive.)
The Read Aloud extension has been around for years. Open source, over a million users on the Chrome Web Store, completely free. Its creator built it as a passion project and it shows — in both the good ways and the frustrating ways. The good: it connects to basically every cloud voice engine that exists. Google WaveNet, Amazon Polly, Microsoft Azure, OpenAI's TTS. You paste in your API key and suddenly you're getting premium AI voices at raw API cost, which works out to something like two cents per long article. The frustrating: without an API key, you get whatever your operating system's built-in speech synthesis offers via the Web Speech API. On my MacBook Pro running Sequoia, the Siri voices are decent. A solid seven out of ten. On my work laptop running Windows 11, the default voices sound like they were synthesized in 2009. Because they were. My colleague James installed it on his ThinkPad, listened for about four seconds, and said "this sounds like a GPS from a rental car" and uninstalled it. And the settings panel has something like twenty configuration options. Pitch, rate, volume, voice, backup voice, highlight color, text detection method. I love that level of control. Most people run screaming from it.
Read Aloud also doesn't do paragraph-level highlighting in a way that feels polished. There's word-level highlighting but it jitters — lands on the wrong word, corrects itself a beat later, jitters again. For developers with cloud API accounts who just want clean audio, it's genuinely the best free option available.
Now here's where I get biased and I want to be upfront about it. CastReader is our product. I work on it. Take the next few paragraphs with appropriate skepticism.
The thing that drove us to build CastReader was the extraction problem. That Edge experience I described — hearing navigation elements and cookie banners and trending article lists read aloud — happens with almost every read aloud tool. The reason is that most tools just grab all the text on the page and start reading. They don't distinguish between an article paragraph and a "Subscribe to our newsletter" popup. CastReader's content script reads the rendered DOM, scores text blocks by density and position and semantic signals, and strips away everything that isn't the actual content. On that Washington Post page? It found the article body cleanly. Menu bar gone. Sidebar gone. "Democracy Dies in Darkness" gone. Just the paragraphs about semiconductors.
The paragraph highlighting is the other piece I'd defend to anyone. The current paragraph lights up on the page and auto-scrolls to keep it centered. I started using it while cooking — glance up from the cutting board, see exactly where the article is, look back down at the onions. It sounds trivial until you try it and then you can't go back to audio-only where you lose your place the second you zone out for three seconds. And you will zone out. Everyone does. The human attention span when listening is brutal and having that visual anchor makes the difference between absorbing a piece and having words wash over you like background noise.
What CastReader doesn't do well: the voice selection is modest compared to the big players. No iOS app, no Android app. It's a browser extension and that's it for now. But the free tier is a real free tier, not a seven-day trial that locks you out on Tuesday. You can try it here. We also have dedicated guides for specific platforms — Substack, Medium, and Notion — if those are where you spend most of your reading time.
Speechify. I need to talk about Speechify because if you've searched for "read aloud" online you've seen their ads everywhere. And the voices are gorgeous. I'm not going to pretend otherwise. Their premium neural voices are the best-sounding TTS I've tested. Natural cadence, good handling of punctuation pauses, they even manage em-dashes without that awkward robotic hesitation most engines produce. The voice named "Snoop Dogg" is a gimmick but some of their standard voices — the ones that just sound like a calm, articulate person reading to you — are remarkable.
That costs $139 a year.
The free version gives you a handful of voices that sound noticeably worse, and a usage cap I burned through in a single session reading a long Paul Graham essay. And there's something else that bugs me. Go to Speechify's blog right now. You'll find articles titled things like "Garage Sale Mysteries in Order" and "Wolf Meme" and "Anne Rice Books in Order." Hundreds of them. SEO content farm stuff that has zero connection to text-to-speech. It tells me where the company's attention might not be going. The Chrome extension feels like it hasn't changed meaningfully in a year. The desktop app uses 800MB of RAM sitting idle. If voice quality is everything and $139/year is nothing to you, Speechify wins on audio fidelity. For everyone else the value proposition is hard to justify.
NaturalReader occupies this middle ground that I find weirdly comforting. Not the best voices, not the worst. Not free, not expensive. Been around forever. Their AI voices in the free tier — I used one called Aria for about a week — sit at a comfortable "good enough" that you stop thinking about voice quality and just absorb the content. Which might actually be the point. There's a dyslexic font toggle that switches to OpenDyslexic, and I sent it to a friend with dyslexia who texted back "where has this been my whole life." Not a flashy feature. A deeply thoughtful one. Pricing is a one-time $99 purchase for the premium tier, which I respect in a world where everything wants a monthly subscription.
Something none of these tools handle well: tables. Code blocks. Mathematical notation. I tested all of them on a Wikipedia article about the Ottoman Empire that has these dense population tables, and every single extension either skipped the tables entirely or read them as a stream of disconnected numbers. "1453 2.5 million 1520 4 million 1600." Meaningless without the column headers. Code blocks are even worse — try listening to a Python tutorial where the extension reads import numpy as np, def calculate underscore gradient open parenthesis x comma y close parenthesis colon. Three seconds of that and you want to throw your laptop. The honest answer is that TTS technology in 2026 still can't handle structured non-prose content in a useful way. If what you're reading is mostly paragraphs of natural language text, read aloud works beautifully. If it's data-heavy or code-heavy, you're better off reading with your eyes.
Paywalled sites are another thing nobody talks about. Here's the trick: because browser extensions run inside your browser tab, they can read whatever you can see. If you're logged into the New York Times or the Wall Street Journal or the Financial Times, the extension sees the full article in the DOM because your browser already loaded it for you. This is fundamentally different from tools that fetch pages externally — those hit the paywall and get nothing. Every extension I mentioned works behind paywalls as long as you're authenticated. Edge's built-in read aloud works too. This is actually one of the strongest arguments for browser-based read aloud over standalone apps.
Speed control matters more than you think. I started at 1x and thought it was fine. Then I bumped to 1.2x. Then 1.3x. I now listen to most articles at 1.5x and long familiar-topic pieces at 1.8x. Your brain adapts faster than you expect. But here's the nuance — the quality of speed adjustment varies wildly between tools. Web Speech API voices (the free browser-built-in ones) sound terrible sped up because the browser literally shortens the audio waveform, creating this chipmunk compression artifact. Cloud AI voices handle speed much better because the speech is generated at the target rate, so a 1.5x voice was synthesized to speak at that pace, not pitch-shifted after the fact. This is one of those invisible differences between free and paid that doesn't show up on feature comparison charts but completely changes the listening experience.
And then there's your phone, which you might have forgotten can do all of this natively.
On iPhone, go to Settings, Accessibility, Spoken Content, and turn on Speak Screen. (If you've triggered this by accident and just want the voice to stop, I wrote a separate guide on how to turn off text-to-speech on every device.) Then on any webpage in Safari, swipe down from the top of the screen with two fingers. The entire page gets read aloud using Siri's voice. The control widget lets you adjust speed and skip forward. It's been there since iOS 8. I asked four iPhone-owning friends if they knew about this feature. Zero of them did. Zero. It doesn't do paragraph highlighting and the extraction is hit-or-miss — same problem as Edge where it reads UI elements — but for casual listening while commuting it works and it's free and it's already on your phone.
Android has Select to Speak in accessibility settings. Long-press the home button, tap the accessibility icon, then tap any text on screen and it reads the selection. More manual than iOS but more precise because you choose exactly what gets read. Google's TTS voices have gotten dramatically better in the last two years, especially on Pixel phones where the neural voice runs locally without internet. Not as convenient as a Chrome extension but surprisingly usable during a commute.
So which method should you actually use? I've settled into a pattern that might work for you. For long articles and research papers on my laptop, CastReader — the paragraph highlighting keeps me anchored and the extraction means I don't hear navigation junk. For quick articles on my phone during my commute, iOS Speak Screen with two-finger swipe. For technical documentation where I need to read code examples with my eyes but want the surrounding prose narrated, I don't use read aloud at all. Some content is still better consumed visually and that's fine.
The real revelation isn't any specific tool. It's that reading and listening aren't competitors — they're different modes for different contexts. I still read with my eyes when I need to study something carefully, annotate it, flip back and forth between sections. But the twelve thousand words of daily screen reading I was doing at 11pm with burning eyes? Half of that happens in my ears now, while I'm cooking or walking or cleaning. Same information. Less strain. More hours that feel like living instead of staring.
Try any of these methods for one week. Pick the free option that matches your setup — Edge read aloud if you use Edge, Read Aloud extension if you use Chrome and don't mind some configuration, CastReader if extraction quality and highlighting matter to you, iOS Speak Screen if you mostly read on your phone. Give it seven days. You'll either abandon it completely because listening isn't your thing, or you'll wonder how you ever consumed the internet exclusively through your eyeballs. There's no middle ground. I haven't met anyone who lands in between.