VoiceCraft

Convert e-books (FB2, EPUB, TXT) to MP3 audiobooks using various Text-to-Speech technologies.

Features

Five TTS providers - RHVoice, Piper, Silero, Coqui XTTS-v2 and ElevenLabs
60+ voices - Multiple voices in Russian and English
Format support - FB2, EPUB, TXT
Speed control - from 0.5x to 2.0x
Multiple themes - Light/dark/system theme support
Multi-language UI - English and Russian interface
Auto-splitting - large books split into parts
GPU acceleration - CUDA support for faster generation

TTS Providers

1. RHVoice

Speed: Fast | Quality: Good | Offline

Lightweight offline engine based on Windows SAPI with minimal installation size (~15 MB per voice). Provides instant speech generation with very low CPU usage, making it perfect for converting large books quickly.

Russian voices:

Aleksandr (Male)
Irina, Anna, Elena (Female)

English voices:

Bdl, Alan (Male)
Slt, Clb (Female)

Download: RHVoice releases

2. Piper (ONNX models)

Speed: Fast | Quality: Good | Offline

Neural TTS engine powered by ONNX Runtime. Offers excellent voice quality with fast generation — processes text 10-50x faster than real-time on most CPUs.

Russian voices (4):

Denis, Dmitri, Ruslan (Male)
Irina (Female)

English voices (29):

US voices: Amy, Kathleen, Kristin, HFC Female, LJSpeech (Female) • Arctic, Bryce, Danny, HFC Male, Joe, John, Kusal, L2Arctic, Lessac, LibriTTS, Norman, Reza Ibrahim, Ryan, Sam (Male)

GB voices: Alba, Cori, Jenny Dioco, Southern English Female (Female) • Alan, Aru, Northern English Male, Semaine, VCTK (Male)

Download models: Piper Voices

3. Silero (PyTorch)

Speed: Medium | Quality: Excellent | Offline

Advanced neural TTS engine built on PyTorch. Delivers natural, expressive speech with excellent prosody.

Russian voices (5):

Aidar, Eugene (Male)
Baya, Kseniya, Xenia (Female)

English voices (4):

Male 1, Male 2 (Male)
Female 1, Female 2 (Female)

Models download automatically on first use (~100-200 MB).

4. Coqui XTTS-v2

Speed: Slow | Quality: Premium | Offline

State-of-the-art multilingual model with 14 built-in speaker voices. Produces the most natural-sounding speech among local engines with exceptional emotional range and prosody.

Voices (14, same for all languages):

Female: Claribel Dervla, Daisy Studious, Gracie Wise, Tammie Ema, Alison Dietlinde, Ana Florence, Annmarie Nele, Asya Anara

Male: Andrew Chipper, Badr Odhiambo, Dionisio Schuyler, Royston Min, Viktor Eka, Abrahan Mack

Supports 17 languages including Russian, English, Spanish, French, German, Italian, Portuguese, Polish, Turkish, Dutch, Czech, Arabic, Chinese, Japanese, Hungarian, Korean, and Hindi.

5. ElevenLabs (Cloud API)

Speed: Fast | Quality: Premium | Online

Premium cloud-based TTS with cutting-edge AI voice synthesis. Offers studio-quality output with remarkable naturalness.

Russian voices:

Adam (Male)
Rachel (Female)

English voices:

Adam, Josh, Sam (Male)
Rachel, Domi, Bella (Female)

Setup:

Create account at ElevenLabs
Get API key from your profile
Add to .env file:

ELEVENLABS_API_KEY=your_api_key_here

Installation

Prerequisites

Node.js 18+
Windows 10/11
Python 3.9+ (for Silero and Coqui)

Quick Start

Clone repository

git clone <repo-url>
cd voicecraft

Install dependencies

npm install

Setup TTS components

Run the universal setup script:

# Via npm (recommended)
npm run setup

# Or directly via PowerShell
powershell .\scripts\setup-all.ps1

This will install:

Piper TTS
FFmpeg
Silero TTS
Coqui XTTS-v2

Download voice models

For RHVoice:

Download and install from RHVoice releases. Voices will be automatically detected after installation.

For Piper:

Download voices from Piper releases and extract to:

tts_resources/piper/voices/

Structure:

tts_resources/
  piper/
    voices/
      ru_RU/
        denis/
          medium/
            ru_RU-denis-medium.onnx
            ru_RU-denis-medium.onnx.json
      en_US/
        lessac/
          medium/
            en_US-lessac-medium.onnx
            en_US-lessac-medium.onnx.json

For Silero:

Models download automatically on first use (~100-200 MB).

# Install only Silero
npm run setup:silero

For Coqui XTTS-v2:

Model downloads automatically on first use (~2 GB). Requires Python 3.9+ and GPU recommended for faster generation.

For ElevenLabs:

Add your API key to .env file:

ELEVENLABS_API_KEY=your_api_key_here

Usage

Development mode

npm run dev

Build application

npm run build
npm run package

Project Structure

voicecraft/
├── electron/               # Electron main process
│   ├── main.ts            # Main process entry point
│   ├── preload.ts         # IPC bridge (preload script)
│   ├── main/
│   │   ├── window.ts      # Window management
│   │   └── handlers/      # IPC handlers
│   └── services/
│       ├── parser.ts      # Book parsing
│       ├── setup/         # Dependency installation
│       └── tts/           # TTS services
├── src/                   # React frontend
│   ├── App.tsx           # Main component
│   ├── i18n/             # Internationalization (EN/RU)
│   ├── components/       # UI components
│   ├── hooks/            # React hooks
│   ├── fsm/              # State machine
│   └── utils/            # Utility functions
├── tts_resources/        # TTS resources
│   ├── piper/           # Piper TTS
│   ├── silero/          # Silero TTS
│   ├── coqui/           # Coqui XTTS-v2
│   ├── ffmpeg/          # FFmpeg for conversion
│   └── tts_server.py    # Python TTS server
├── scripts/
│   ├── setup-all.ps1    # Universal setup
│   ├── setup-silero.ps1 # Setup only Silero
│   └── release.cjs      # Release automation
└── .env                 # Environment variables (API keys)

Performance

Provider	Speed	Quality	Model Size	Type	Recommendation
RHVoice	Fast	Good	~15 MB	CPU	Quick processing
Piper	Fast	Good	~50 MB	CPU	Balanced option
Silero	Medium	Excellent	~100-200 MB	CPU/GPU	Natural Russian voices
Coqui	Slow	Premium	~2 GB	CPU/GPU	Best offline quality
ElevenLabs	Fast	Premium	Cloud	API	Best overall quality

GPU Acceleration

Silero and Coqui support hardware acceleration for faster speech generation:

Accelerator	Supported GPUs	PyTorch Size	Speed Boost
CUDA	NVIDIA (GTX 10xx+, RTX series)	~2.3 GB	3-10x
Intel XPU	Intel Arc, Iris Xe, UHD 7xx+	~500 MB	2-5x
CPU	Any	~200 MB	Baseline

Enabling GPU Acceleration

During initial setup: When installing Silero or Coqui, select your preferred accelerator (CUDA, Intel XPU, or CPU)
Change accelerator later: Go to Settings → TTS Setup → click "Reinstall" button next to Silero or Coqui to change the accelerator

Requirements

For NVIDIA CUDA:

NVIDIA GPU with CUDA support (GTX 10xx or newer recommended)
Latest NVIDIA drivers installed
Automatically detected via nvidia-smi

For Intel XPU:

Intel Arc, Iris Xe, or UHD Graphics 7xx+
Latest Intel GPU drivers
Intel Extension for PyTorch (installed automatically)

Auto-Detection

The application automatically detects available GPUs:

Priority order: CUDA → Intel XPU → CPU
GPU name and VRAM are displayed in the setup dialog
If no compatible GPU is found, CPU mode is used

Parallelization

RHVoice: up to 30 parallel threads
Piper: up to 10 parallel threads
Silero: up to 5 parallel threads
Coqui: 1 thread (sequential processing)
ElevenLabs: up to 3 parallel requests

Troubleshooting

RHVoice not working

Install RHVoice from official releases
Restart application after installation
Voices are detected automatically via Windows SAPI

Piper not working

Make sure voice models are downloaded
Check directory structure
.onnx and .onnx.json files must be in same folder

Silero slow generation

Normal - it uses PyTorch models
First run downloads models (~100-200 MB)
For large books consider Piper or RHVoice

Coqui XTTS-v2 issues

First run downloads ~2 GB model
GPU recommended for faster generation
Requires Python 3.9+
Check that tts_resources/coqui/venv exists

ElevenLabs not working

Check that API key is set in .env file
Verify API key is valid at elevenlabs.io
Check internet connection

FFmpeg errors

Make sure FFmpeg is installed: npm run setup
Check that tts_resources/ffmpeg/ffmpeg.exe exists

License

MIT

Acknowledgements

RHVoice - Lightweight SAPI voices
Piper - Fast ONNX TTS models
Silero - Natural PyTorch voices
Coqui TTS - State-of-the-art XTTS-v2 model
ElevenLabs - Premium cloud TTS
FFmpeg - Audio conversion

Name		Name	Last commit message	Last commit date
Latest commit History 82 Commits
.github/workflows		.github/workflows
build		build
electron		electron
public		public
scripts		scripts
src		src
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
components.json		components.json
index.html		index.html
package-lock.json		package-lock.json
package.json		package.json
postcss.config.js		postcss.config.js
tailwind.config.js		tailwind.config.js
tsconfig.json		tsconfig.json
tsconfig.node.json		tsconfig.node.json
vite.config.ts		vite.config.ts

License

prokhororlov/VoiceCraft

Folders and files

Latest commit

History

Repository files navigation

VoiceCraft

Features

TTS Providers

1. RHVoice

Russian voices:

English voices:

2. Piper (ONNX models)

Russian voices (4):

English voices (29):

3. Silero (PyTorch)

Russian voices (5):

English voices (4):

4. Coqui XTTS-v2

Voices (14, same for all languages):

5. ElevenLabs (Cloud API)

Russian voices:

English voices:

Installation

Prerequisites

Quick Start

For RHVoice:

For Piper:

For Silero:

For Coqui XTTS-v2:

For ElevenLabs:

Usage

Development mode

Build application

Project Structure

Performance

GPU Acceleration

Enabling GPU Acceleration

Requirements

Auto-Detection

Parallelization

Troubleshooting

RHVoice not working

Piper not working

Silero slow generation

Coqui XTTS-v2 issues

ElevenLabs not working

FFmpeg errors

License

Acknowledgements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 24

Contributors 2

Uh oh!

Languages