KittenTTS Android — IELTS Speaking & Reading Trainer

An on-device English learning app for IELTS speaking and reading practice, built on KittenML neural TTS plus an on-device Whisper speech recognizer. Read a passage aloud, and the app transcribes your voice and scores your reading accuracy (Word Error Rate) and an IELTS band estimate across the four official criteria — all fully offline, no internet required after install.

It also keeps the original Voice Studio (text-to-speech) so you can hear a model read any passage before you try it yourself.

How it works

The app has three tabs: Reading, Speaking, and Voice Studio.

Reading (read-aloud)

Pick from 200 bundled passages, graded across IELTS bands and topics.
Listen — a neural voice reads the passage aloud (KittenTTS).
Record & read aloud — your microphone audio is captured at 16 kHz.
Score — Whisper-tiny.en (ONNX, on-device) transcribes your speech; it is aligned word-by-word against the passage to compute Word Error Rate, reading accuracy, and an IELTS-style band breakdown.

Speaking (free response)

Pick an IELTS-style prompt (Part 1 questions, Part 2 cue cards, Part 3 discussion).
Record your answer in your own words.
Score — Whisper transcribes it, then a fluency analyzer (pace, pauses, fillers) and a free-speech scorer estimate all four official criteria — now genuinely from your own vocabulary and grammar: Task Response · Fluency & Coherence · Lexical Resource · Grammatical Range & Accuracy, plus an overall band. It also shows which prompt points you covered and a transcript.

Screenshots

Features

IELTS reading practice

200 bundled read-aloud passages graded by IELTS band (4.5–9) and topic
On-device speech recognition (Whisper-tiny.en, ONNX) — fully offline
Word Error Rate scoring with word-level highlighting (correct / misread / skipped)
IELTS band estimate across the four official criteria
"Listen" button — hear a neural voice read the passage first

IELTS speaking practice (free response)

IELTS-style prompts across Part 1, Part 2 (cue cards), and Part 3
Answer in your own words; Whisper transcribes on-device
Fluency analysis from the audio: words-per-minute, long pauses, filler words
Full four-criteria band estimate from your own vocabulary and grammar
Prompt-point coverage feedback + transcript

Voice Studio (text-to-speech)

3 model sizes: Nano (15M), Micro (40M), Mini (80M)
8 voices: Rosie, Bella, Jasper, Luna, Bruno, Hugo, Kiki, Leo
Adjustable speed (0.5x – 2.0x)
Download generated audio as WAV to device
Long text / large context support — paste entire articles, stories, or paragraphs. Text is automatically split into chunks at sentence boundaries, each chunk is synthesized independently, and the audio is seamlessly concatenated into a single output.
100% on-device inference via ONNX Runtime
Dark theme UI matching the iOS version

Architecture

Speech recognition + scoring (IELTS practice)

Microphone (16 kHz mono PCM)
  → Log-mel spectrogram (80 bins, Whisper spec)
  → Whisper encoder ONNX  → hidden states
  → Whisper decoder ONNX  → greedy token decode (30 s windows)
  → Byte-level BPE detokenize → transcript
  → WER alignment vs. passage → accuracy + highlighting
  → IELTS four-criteria band estimate

The Whisper ONNX models (quantized whisper-tiny.en, ~41 MB) are bundled in the APK (assets/asr/) and run on the same ONNX Runtime used for TTS. Fetch them with tools/download_whisper_onnx.py (no PyTorch needed) — see app/src/main/assets/asr/README.md. The pipeline (mel spectrogram + greedy decode + byte-level tokenizer) is validated to match the reference HuggingFace implementation exactly.

The IELTS band estimate is an honest approximation derived from reading accuracy, coverage, and pace. A read-aloud task cannot fully measure free-speech lexical/grammatical range; those proxies are documented in IeltsScorer.kt.

Text-to-speech (Voice Studio)

Text Input (any length)
  → Auto-chunking (max 400 chars at sentence boundaries)
  → Per-chunk: Punctuation normalization
  → Per-chunk: espeak-ng phonemization (JNI/NDK)
  → Per-chunk: IPA tokenization (178-token vocabulary)
  → Per-chunk: ONNX Runtime inference (24kHz Float32 PCM)
  → Concatenate all chunk audio
  → AudioTrack playback / WAV download

Tech Stack

Component	Technology
UI	Kotlin + Jetpack Compose + Navigation Compose
ML Inference	ONNX Runtime Android (TTS + Whisper ASR)
Speech recognition	Whisper-tiny.en (ONNX, on-device)
Phonemization	espeak-ng (C via JNI/NDK)
Audio	AudioTrack playback (24kHz) · AudioRecord capture (16kHz)
Scoring	Word Error Rate (Levenshtein) + IELTS band heuristics
Build	Gradle KTS, Android NDK, CMake

Download

Get the latest APK from Releases.

Building from Source

Prerequisites

Android Studio (latest)
Android SDK 34
Android NDK 27+
JDK 17

Steps

Clone the repo:

git clone https://github.com/rockerritesh/kitten-tts-android.git
cd kitten-tts-android
git lfs pull

Build espeak-ng native library:
```
./build-espeak-ng.sh
```
Fetch the Whisper ASR model into assets/asr/ (needed for speaking/reading scoring):
```
python3 tools/download_whisper_onnx.py
```
Open in Android Studio and build, or:
```
./gradlew assembleDebug
```

Project Structure

app/src/main/
├── java/com/kittenml/tts/
│   ├── MainActivity.kt           # Entry point + bottom-nav (Practice / Voice Studio)
│   ├── engine/                   # TTS
│   │   ├── KittenTTSEngine.kt    # Core TTS pipeline
│   │   ├── EspeakBridge.kt       # JNI wrapper
│   │   └── AudioPlayer.kt        # AudioTrack playback
│   ├── asr/                      # Speech recognition
│   │   ├── AudioRecorder.kt      # 16 kHz mic capture
│   │   ├── MelSpectrogram.kt     # 80-bin log-mel features
│   │   ├── WhisperTokenizer.kt   # byte-level BPE decode
│   │   ├── WhisperAsrEngine.kt   # encoder/decoder ONNX inference
│   │   └── AsrState.kt
│   ├── scoring/
│   │   ├── WerScorer.kt          # Word Error Rate + alignment
│   │   ├── ScoreResult.kt
│   │   ├── IeltsScorer.kt        # read-aloud four-criteria estimate
│   │   ├── IeltsAssessment.kt
│   │   ├── BandUtil.kt           # quality → IELTS band helpers
│   │   ├── FluencyAnalyzer.kt    # pace / pauses / fillers from audio
│   │   └── FreeSpeechScorer.kt   # free-response four-criteria estimate
│   ├── data/
│   │   ├── Paragraph.kt / ParagraphRepository.kt
│   │   └── SpeakingPrompt.kt / SpeakingPromptRepository.kt
│   ├── ui/
│   │   ├── theme/                 # Dark theme (Color, Theme, Type)
│   │   └── screen/
│   │       ├── TTSScreen.kt      # Voice Studio UI
│   │       ├── TTSViewModel.kt
│   │       ├── practice/         # reading: list + record/score + ViewModel
│   │       └── speaking/         # free speaking: list + record/score + ViewModel
│   └── model/
│       ├── TTSModel.kt           # Model enum
│       └── EngineState.kt        # Engine state
├── cpp/
│   ├── espeak-bridge.c           # C phonemization bridge
│   ├── espeak-jni.c              # JNI glue layer
│   └── CMakeLists.txt            # NDK build config
└── assets/
    ├── models/                    # TTS ONNX model files (~168 MB)
    ├── voices/                    # Voice embedding JSONs (~51 MB)
    ├── espeak-ng-data/            # Phoneme data files (~1 MB)
    ├── asr/                       # Whisper ONNX + vocab (see asr/README.md)
    └── ielts/
        ├── paragraphs.json        # 200 read-aloud passages
        └── speaking_prompts.json  # free-speaking prompts (Parts 1–3)

IELTS passages

The bundled passages live in app/src/main/assets/ielts/paragraphs.json, each with id, title, topic, band, and text. The app ships with 200 graded passages across 15 topics and IELTS bands 4.5–9.0. Add or edit passages by changing this file — no code change is needed, as the list is read at runtime. Free-speaking prompts live alongside it in speaking_prompts.json.

License

Apache 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.github/workflows		.github/workflows
app		app
gradle		gradle
screenshots		screenshots
tools		tools
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
build-espeak-ng.sh		build-espeak-ng.sh
build.gradle.kts		build.gradle.kts
gradle.properties		gradle.properties
settings.gradle.kts		settings.gradle.kts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

KittenTTS Android — IELTS Speaking & Reading Trainer

How it works

Screenshots

Features

Architecture

Speech recognition + scoring (IELTS practice)

Text-to-speech (Voice Studio)

Tech Stack

Download

Building from Source

Prerequisites

Steps

Project Structure

IELTS passages

License

About

Uh oh!

Releases 4

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

KittenTTS Android — IELTS Speaking & Reading Trainer

How it works

Screenshots

Features

Architecture

Speech recognition + scoring (IELTS practice)

Text-to-speech (Voice Studio)

Tech Stack

Download

Building from Source

Prerequisites

Steps

Project Structure

IELTS passages

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages