Fifteen tabs. That's how many articles I had saved last Tuesday morning before my commute. A Paul Graham essay. Two arXiv papers on diffusion models. A 4,000-word Substack post about chip manufacturing. Some Hacker News thread that looked interesting at midnight.
I read zero of them.
Sound familiar? You bookmark things. You email yourself links. You add them to Pocket, to Instapaper, to a Notion database with a "To Read" column that grows like mold. The articles pile up. You never go back.
Here's what actually worked: I told my Telegram bot to read them to me.
The Gap Nobody Talks About
AI agents have gotten absurdly capable. My Telegram bot can search the web, summarize PDFs, generate images, write code, translate between languages. It does all of this without me leaving the chat window. Impressive stuff.
But try asking it to read a web article out loud.
"Hey, read this to me: https://paulgraham.com/persistence.html"
What happens? One of three things. The bot tries to pronounce the URL character by character. Or it fetches the page and dumps raw HTML at you. Or — most likely — it just summarizes the article, which is not what you asked for. You wanted to hear it. The full thing. While walking to the train.
The problem is structural. Most TTS skills accept plain text as input. A string goes in, an audio file comes out. They have no idea what a URL is. No ability to fetch a web page. No logic to separate an article from the navigation bar, the cookie banner, the "Subscribe to my newsletter" popup, the sidebar full of trending posts, the comments section.
Even if you copy-paste the article text manually — and who's going to do that on a phone? — you lose paragraph structure. Headings disappear. The TTS engine gets a wall of text and plows through it like a steamroller.
Skills That Actually Understand URLs
What you need is a skill that does two things: extract and speak. Fetch the page, figure out which parts are the article, throw away the junk, then convert the clean text to audio. One step.
This is where OpenClaw comes in. OpenClaw is an open protocol that lets AI agents use "skills" — modular tools published to ClawHub. Think of it like npm for agent capabilities. Your Telegram bot installs a skill, and suddenly it can do something new.
The skill you want is called CastReader. It's the only one on ClawHub right now that handles the full URL-to-audio pipeline.
Setting It Up (Three Steps, No Code)
1. Get an OpenClaw-compatible agent.
Telegram is the easiest starting point. The official OpenClaw bot runs on Telegram — search for it, start a chat. Discord works too. If you're running your own agent with the OpenClaw SDK, same deal.
2. Install the CastReader skill.
Send this to your agent:
clawhub install castreaderDone. No API keys. No configuration. No billing setup.
3. Send a URL.
read https://paulgraham.com/persistence.htmlThat's it. Audio comes back.
What Happens Under the Hood
No magic. No handwaving. Here's the actual sequence:
Your agent receives the URL. It calls the CastReader skill. The skill launches a headless browser — a real Chromium instance, not a simple HTTP fetch — and loads the page. Why a real browser? Because half the modern web doesn't work without JavaScript. SPAs, dynamic loading, client-side rendering. A plain fetch() gets you an empty <div id="root"></div> and nothing else.
Once the page renders, CastReader's extraction engine kicks in. It walks the DOM tree, scores every container by text density, link ratios, and semantic signals. Navigation menus have high link density — thrown out. Footers with copyright notices — gone. Cookie consent overlays — stripped. What's left is the article.
The clean text gets split into paragraphs and fed to Kokoro TTS. Each paragraph becomes an MP3 segment. The skill sends these back through your agent one by one: a text message showing the paragraph, followed by the audio. Paragraph by paragraph. You can pause, skip ahead, replay.
Total time for a 2,000-word article? Under a minute.
Real Usage, Not Marketing
I've been using this daily for three weeks. Some actual results:
Sent my agent a Paul Graham essay (https://paulgraham.com/greatwork.html). Got 12 audio segments back in 47 seconds. Clean. No "Home | Essays | Bio" preamble. Just the essay.
Tried a WeChat Read novel chapter — this is Tencent's reading platform with 300 million users. Most extraction tools can't touch it because the text is rendered on canvas. The DOM is empty. CastReader has a dedicated extractor that intercepts the chapter data before it hits the canvas renderer. It worked. I was honestly surprised.
An arXiv paper (https://arxiv.org/abs/2401.06209). It stripped the LaTeX boilerplate, the author affiliations block, the references section. Read the abstract, introduction, and main content sections. Not perfect on math notation — that's inherently hard for TTS — but the prose sections were clean.
A 6,000-word Substack post with embedded tweets and pull quotes. Extracted the article body, skipped the tweet embeds (they're interactive iframes, not text), read everything else in order.
What About Other TTS Skills?
ClawHub has several TTS options. kokoro-tts gives you the same Kokoro voice model — excellent quality, 40+ languages, completely free. openai-tts uses OpenAI's voices if you prefer those. mac-tts uses Apple's built-in speech synthesis.
All three are solid. For plain text. You send a string, you get audio back.
None of them can take a URL. None of them extract content from web pages. That's not a knock on them — it's just a different tool for a different job. If your workflow involves reading articles from URLs, CastReader is currently the only skill that closes the loop.
Try It
Head to /openclaw for setup details and the full command reference. The skill is free. No API key, no usage limits, no trial period that expires.
Fifteen tabs of articles, read on your commute. That's the pitch. Nothing more complicated than that.