Voicebox
🚀 Description
jamiepine/voicebox — local-first AI voice studio. Free and open-source alternative to ElevenLabs (TTS) and WisprFlow (STT) in one app. Clone any voice from a few seconds of audio, generate speech in 23 languages across 7 TTS engines, dictate into any text field with a global hotkey, give any MCP-aware agent a voice you’ve cloned.
Built with Tauri (Rust), not Electron. Privacy by default — models, voice data, and captures never leave the machine.
🧩 Features
- Complete privacy — runs entirely locally
- 7 TTS engines — Qwen3-TTS, Qwen CustomVoice, LuxTTS, Chatterbox Multilingual, Chatterbox Turbo, HumeAI TADA, Kokoro
- Voice cloning — zero-shot from a reference sample, or 50+ preset voices
- 23 languages — incl. Polish, Arabic, Japanese, Hindi, Swahili
- Post-processing — pitch shift, reverb, delay, chorus, compressor, filters (Spotify pedalboard)
- Expressive speech — paralinguistic tags
[laugh][sigh][gasp]via Chatterbox Turbo - Unlimited length — auto-chunking + crossfade up to 50,000 chars
- Voice input — global dictation hotkey, push-to-talk, auto-paste, in-app mic, Whisper STT
- Agent voice output —
voicebox.speakMCP tool — Claude Code, Cursor, Cline speak in cloned voices - Voice personalities — free-form personas + bundled local LLM for Compose/Rewrite/Respond (also MCP-exposed)
- API-first — REST API + built-in MCP server
- Runs everywhere — macOS MLX/Metal, Windows CUDA, Linux, AMD ROCm, Intel Arc, Docker
🎨 Why this matters for an agent workflow
Bridges the input and output halves of voice I/O. Existing cloud incumbents sit on opposite sides — Voicebox does both, glued by a local LLM. MCP integration means a Claude Code session can literally talk back to you in a voice you own — useful for Pulse-style background notifications or Personal AI Infrastructure DA personas.
Reasoning for
Local voice cloning for personal projects, dictation across all apps, giving any MCP-aware agent a configurable voice. The privacy story is the killer feature — ElevenLabs has every word you’ve cloned; Voicebox keeps it on the box.
Alternatives considered
- ElevenLabs — best quality, cloud-only, paid
- WisprFlow — STT-only, cloud
- Whisper.cpp + Piper — DIY local stack, no GUI, no MCP
🔗 Links
- Repo: https://github.com/jamiepine/voicebox
- Site: https://voicebox.sh/
- Docs: https://docs.voicebox.sh/
🔗 Related notes
- Claude Code — primary MCP host
- Personal AI Infrastructure — DA voice personas
- Hermes Agent — agent that benefits from spoken output
- Paperclip — agent orchestration
Template: tool