Forty-seven tabs. That was my browser last Tuesday. All arXiv papers. I had skimmed exactly three of them.
This is the rhythm of modern research: your Semantic Scholar alerts fire, your lab Slack drops six links before lunch, someone on Twitter quotes a result you need to verify, and by Thursday your reading list has become a geological formation — layers of unread PDFs compressed under the weight of good intentions. You tell yourself you'll read them on the weekend. You won't.
I started listening to them instead. Not as audiobooks. Not through some elaborate pipeline involving OCR and cloud APIs and $20/month subscriptions. Just a Chrome extension, one click, and the paper starts playing through my headphones while I walk to the coffee shop.
arXiv Has HTML Now (This Changes Everything)
Here's the thing most researchers haven't noticed yet. arXiv started rendering HTML versions of papers in 2023 using ar5iv and LaTeXML. By 2024, the majority of new submissions have an HTML option. That little "HTML" link next to the PDF download? It opens the same paper as a real web page — headings, paragraphs, figures, proper text in the DOM.
This matters enormously for text-to-speech. A PDF is a layout format. It knows where ink goes on a page. It does not know what a paragraph is. Two-column layouts, headers that float between sections, footnotes wedged into margins — PDF readers have to guess where sentences begin and end. They guess wrong constantly.
HTML is structured text. Paragraphs are paragraphs. Headings are headings. A TTS tool that reads web pages can read an arXiv HTML paper the same way it reads a blog post or a Wikipedia article. No conversion. No upload. No guessing.
One Click to Listen
CastReader is a free Chrome extension that reads any web page aloud. Open an arXiv HTML paper. Click the icon. Done.
The paper starts playing with a natural AI voice. Each paragraph highlights as it's read. A floating player bar sits at the bottom — pause, resume, speed control, click any paragraph to jump to it. The page scrolls to follow along so you don't lose your place.
No account. No signup. No credits. No usage limits. Free.
Here's the actual workflow I use every morning:
- Open my arXiv feed or a paper link someone shared
- Click the "HTML" link to open the HTML version
- Click the CastReader icon
- Put on headphones, start making breakfast
I get through two or three papers before I've finished my eggs. Not deep reading — I'll come back to the important ones and read them properly. But I now have a mental map of what each paper argues, which ones are relevant to my work, and which ones I can safely archive. That triage used to take me an entire afternoon of skimming.
What About the Math?
Let's be honest about this. CastReader skips mathematical notation. Inline equations, display equations, theorem environments rendered as images or MathML — the TTS engine doesn't attempt to vocalize them. You'll hear "equation" or a brief pause, then the text continues.
Is this a dealbreaker? Depends on the paper. For a machine learning paper where the prose explains the architecture and the equations formalize it, you get 80% of the understanding from the text alone. "We apply a softmax over the attention scores" tells you what's happening even without seeing the formula. For a pure mathematics paper where every other sentence references Theorem 3.2 and the proof is the entire point — yeah, audio isn't going to work. Read that one with your eyes.
But most papers I encounter in ML, NLP, systems, and HCI are prose-heavy. The abstract, introduction, related work, methodology description, results discussion, and conclusion are all natural language. That's six out of eight typical sections. The math lives in a couple of dense sections that you'd want to study visually anyway.
Why Not Just Use a Screen Reader?
Your operating system has one built in. VoiceOver on Mac, Narrator on Windows, Orca on Linux. They'll read arXiv pages.
They'll also announce every navigation element, every toolbar button, every link in the sidebar, the "Download PDF" button text, the author affiliation superscripts, and the BibTeX citation metadata. A screen reader reads everything because it's designed for people who can't see the screen and need to know what's there. That's an important and different use case.
The voice itself is another problem. System TTS voices sound like they're reading a terms-of-service agreement at a deposition. Flat cadence, mechanical rhythm, zero variation. Listening to a 6,000-word research paper in that voice is an endurance sport. After fifteen minutes your brain stops processing content and starts fantasizing about silence.
CastReader uses neural AI voices trained on natural speech patterns. The difference is the gap between a MIDI rendition of Clair de Lune and someone actually playing it. Both contain the same notes. One of them you can listen to for an hour.
And CastReader extracts only the article body. Not the sidebar. Not the header. Not the "Subjects: cs.CL" metadata line. Just the paper's actual text, paragraph by paragraph.
For PDF-Only Papers
Some older papers or certain formats don't have an HTML version on arXiv. You have options.
First, check if ar5iv has converted it. Go to ar5iv.labs.arxiv.org and paste the paper ID. Many papers that don't show the HTML link on the main arXiv page have been converted by ar5iv anyway.
Second, CastReader works on PDFs opened in your browser. Chrome, Edge, and Firefox all have built-in PDF viewers that render the document as a web page. Open the PDF link directly in your browser, click CastReader, and it reads whatever text the browser has extracted. This works well for single-column papers. Two-column layouts are shakier — the browser's PDF renderer sometimes interleaves columns, producing garbled paragraph order. Single-column preprints, dissertations, and technical reports come through clean.
Third, some researchers convert papers to Markdown using tools like Nougat or Marker, then read them in a browser. Overkill for casual listening, but if you're already in that workflow, CastReader picks up the rendered Markdown perfectly.
When This Actually Matters
Paper triage. You have twenty new papers in your feed. Reading all twenty abstracts and introductions takes an hour of focused screen time. Listening to them during your commute takes the same hour but frees your eyes and your hands. By the time you sit down at your desk, you know which three papers deserve deep reading.
Literature review prep. You're writing a related work section and need to re-familiarize yourself with fifteen papers you read six months ago. Skimming them again feels like drudgery. Listening to the introductions and conclusions while doing laundry turns a dreaded task into background processing.
Eye fatigue. Twelve hours into a paper deadline, your vision is blurring and you still need to check three references. Your eyes are done. Your ears aren't.
Accessibility. Researchers with dyslexia, visual impairments, or chronic eye conditions shouldn't have to fight their own bodies to do their jobs. Audio is not a workaround — for many people it's the primary way complex text gets absorbed.
Setup (Under a Minute)
Install CastReader from the Chrome Web Store. Also works on Edge.
Open any arXiv paper and click the HTML link to get the web version.
Click the CastReader icon. Audio starts in a few seconds. Paragraph highlighting follows along.
Adjust speed with the floating player. I run mine at 1.3x for familiar topics, 1.0x for dense material in subfields I don't know well.
That's the whole setup. No API keys. No configuration file. No Python script.
The Honest Tradeoff
Listening to a paper is not the same as reading it. You won't catch a subtle flaw in the experimental design at 1.5x speed while chopping onions. You won't internalize a novel proof technique from audio alone. Math gets skipped. Tables and figures are invisible to TTS.
But "listen or don't engage at all" is the real choice for most papers in most researchers' queues. Forty-seven tabs gathering dust versus forty-seven papers heard during walks, commutes, and gym sessions. Imperfect comprehension beats zero comprehension every single time.
I still read papers the old way. The important ones. The three or four per week that directly impact my work. But the other forty-three? I listen. And I know what's in them now, which is more than I could say when they were just tabs. If you want to see how CastReader handles other formats, we tested it against every major TTS Chrome extension — including on Kindle, WeRead, and other platforms that break generic tools.