Five minutes. That's all this takes. You'll go from zero to having a Telegram bot that turns any URL into an MP3 — extracting the actual article text, not the cookie banners and sidebar garbage.
Here's the full walkthrough for installing and using the CastReader skill on OpenClaw.
What You'll Need
Three things:
- A ClawHub account — free, sign up at clawhub.ai
- An OpenClaw-compatible agent connected to Telegram, Discord, or Slack (the OpenClaw docs cover agent setup if you haven't done this yet)
- Optionally: the CastReader Chrome extension — only needed for browser read-aloud mode with paragraph highlighting. Grab it from the Chrome Web Store.
That's the prerequisites. No API keys. No YAML files. No environment variables unless you want to customize voice settings.
Step 1: Install the Skill
Two ways to do this.
Option A — visit clawhub.ai/vinxu/castreader and click the Install button. Done.
Option B — terminal:
clawhub install castreaderVerify it's there:
clawhub listYou should see castreader in the output alongside whatever other skills you've installed. If it doesn't appear, run the install command again — occasionally ClawHub's registry takes a few seconds to propagate.
Step 2: Your First Extract
Send your OpenClaw agent this message on Telegram (or Discord, or Slack):
Extract the text from https://paulgraham.com/greatwork.html
The agent invokes the CastReader extract command. It fetches the page, runs the content through a scoring algorithm that separates article text from navigation, footers, and ads, then returns structured JSON.
The output looks like this:
{
"title": "How to Do Great Work",
"language": "en",
"paragraphs": [
"If you collected lists of techniques for doing great work...",
"The first step is to decide what to work on...",
"..."
],
"paragraphCount": 87,
"wordCount": 11842
}Every paragraph is a clean text block. No HTML tags. No inline styles. No "Share on Twitter" buttons mixed into the content. The extraction pipeline handles the filtering — 15+ dedicated extractors for specific platforms, plus a general-purpose visible-text-block algorithm for everything else.
Try it on different sites. A Wikipedia article (https://en.wikipedia.org/wiki/Turing_machine). An arXiv abstract page. A Substack newsletter. The extractor adapts to each site's DOM structure.
Step 3: Generate Audio
Now the useful part. Send your agent:
Generate audio from https://en.wikipedia.org/wiki/Diffie-Hellman_key_exchange
CastReader extracts the article, splits it into paragraphs, and feeds each one through the Kokoro TTS model. You get back:
- Per-paragraph MP3s:
001.mp3,002.mp3,003.mp3... (handy if you want to skip around) - A combined
full.mp3: the entire article as one continuous file - A JSON manifest: maps each MP3 to its source paragraph text and timestamps
A typical 2,000-word article takes about 15 seconds to process. Longer pieces — say a 12,000-word Paul Graham essay — might take a minute. The audio streams back as soon as the first chunks are ready, so you can start listening before the full file finishes generating.
Step 4: Read Aloud in Browser
This one's different. Instead of getting an MP3 file, the read-aloud command opens the page in a real browser and reads it with live paragraph highlighting.
Read aloud https://arxiv.org/abs/2301.00234
What happens: CastReader launches a browser session with the Chrome extension loaded, navigates to the URL, triggers extraction, and starts TTS playback. Each paragraph gets highlighted as the voice reaches it. The page auto-scrolls to keep the current paragraph visible. Click any paragraph to jump there.
This mode requires the Chrome extension installed locally. The audio playback and DOM highlighting both run inside the extension's content script — same browser context, zero latency between what you hear and what you see highlighted.
Particularly useful for Kindle Cloud Reader and WeRead, where server-side extraction can't work at all (font encryption and canvas rendering, respectively).
Customizing Voice and Speed
Two environment variables control the output:
export CASTREADER_VOICE=af_heart # default voice
export CASTREADER_SPEED=1.5 # default speed multiplierThe Kokoro model behind CastReader supports 40+ languages — English, Chinese, Japanese, Korean, French, German, Spanish, and dozens more. Language detection is automatic. Send it a Chinese article, get Chinese audio. No configuration needed.
Speed goes from 0.5 (half speed, good for language learning) up to 2.0 (for skimming familiar material). The 1.5 default hits a comfortable pace for most English content.
What Makes This Different
Other TTS skills on ClawHub handle plain text. CastReader handles URLs. That's the core difference, and it's a bigger gap than it sounds.
| Feature | CastReader | kokoro-tts | openai-tts | mac-tts |
|---|---|---|---|---|
| Accepts URLs | Yes | No | No | No |
| Web page extraction | 15+ platform extractors | None | None | None |
| Kindle font decryption | Yes | N/A | N/A | N/A |
| WeRead canvas extraction | Yes | N/A | N/A | N/A |
| Browser highlighting | Yes | No | No | No |
| Requires API key | No | No | Yes (OpenAI) | No |
If someone sends you a text snippet and you want audio, any TTS skill works fine. If someone sends you a link — that's where CastReader earns its keep.
Troubleshooting
"No content extracted" — The page might rely heavily on client-side JavaScript rendering. CastReader uses Puppeteer headless to render pages, but some sites with aggressive anti-bot measures can block headless browsers. Try the read-aloud command instead, which uses a real browser session.
"Command not found: castreader" — The skill isn't installed. Run:
clawhub install castreader
clawhub listChrome extension not responding — Make sure it's enabled in chrome://extensions/. Refresh the target page. If you just installed it, you may need to restart Chrome.
Audio sounds robotic or cuts off — Check your speed setting. Values above 2.0 can produce artifacts. Reset with export CASTREADER_SPEED=1.5.
What's Next
You've got the basics. Some things worth trying:
- Kindle Cloud Reader (
https://read.amazon.com) — CastReader decodes the encrypted font subsets. No other tool does this. - WeRead (
https://weread.qq.com) — Canvas-rendered text, extracted via fetch interception. - arXiv papers — Send a paper URL, get an audio summary of the abstract and full text.
- Pipeline composition — Chain skills together. Use
web-searchto find articles, pipe the URLs intogenerate-audio, build a podcast feed from your research queue.
clawhub install castreaderOne command to install. One message to your agent to use it. Send a URL, get audio back.