A fine-tuned Whisper model for transcribing Hakha Chin (CNH) speech to text and translating to English. Built to help bridge language barriers in Hakha Chin-speaking communities.
This project fine-tunes OpenAI's Whisper model on Hakha Chin Bible audio to create a speech-to-text system for this low-resource language. The model transcribes Hakha Chin audio and provides automatic translation to English.
Current Status: ✅ V4 Model - Production Ready (with limitations)
- Speech-to-Text: Transcribe Hakha Chin audio to text
- Translation: Automatic translation to English
- Web Interface: Easy-to-use Gradio interface
- Audio Processing: Handles uploaded files and microphone input
- Sliding Window: Processes long audio in manageable chunks
- Python 3.8+
- GPU recommended (CUDA support for faster processing)
- ~2GB disk space for model files
# Clone the repository
git clone https://github.com/trinitron88/ChinTranslator2.git
cd ChinTranslator2
# Install dependencies
pip install torch transformers gradio librosa deep-translator soundfile numpypython gradio_interface.pyThe interface will launch in your browser with a shareable public link.
- Training Data: 1,375 segments from 44 Bible chapters (Mark & Matthew)
- Validation Data: 344 segments
- Training Loss: 6.47 → 2.0 (smooth descent)
- Estimated Accuracy: 60-70% on biblical text
-
Training Data Constraints:
- All male narrators (Bible speakers only)
- Biblical/formal vocabulary domain
- Read speech, not conversational
- Single audio source
-
Performance:
- Processing speed: ~3-4x real-time on GPU
- Lower accuracy on non-biblical conversational speech
- Reduced accuracy on female voices
-
Domain:
- Best for biblical or formal Hakha Chin
- Limited modern/conversational vocabulary
.
├── README.md
├── gradio_interface.py # Main web interface (optimized)
├── fine-tuning-aligned.py # Training script
├── whisper_alignment_2.py # Audio-text alignment
├── process-matthew.py # Data preprocessing
├── continue_training.py # Continue training existing model
├── aligned_train_data.json # Training segments (1,375)
├── aligned_val_data.json # Validation segments (344)
├── Audio/ # Audio files (mark_*.mp3, matt_*.mp3)
├── Text/ # Text transcripts (*.txt)
└── whisper-hakha-chin/ # Fine-tuned model (V4)
-
Launch the Gradio interface:
python gradio_interface.py
-
Choose input method:
- Upload Audio: Upload an audio file (MP3, WAV, etc.)
- Record Audio: Use your microphone to record
-
Click "Transcribe & Translate"
-
View results:
- Hakha Chin transcription
- English translation
-
Prepare your data:
- Audio files in
Audio/directory - Corresponding text files in
Text/directory - Use naming convention:
book_chapter.mp3andbook_chapter.txt
- Audio files in
-
Align audio and text:
python whisper_alignment_2.py
-
Train the model:
python fine-tuning-aligned.py
-
Model will be saved to
./whisper-hakha-chin/
- Base Model: OpenAI Whisper Small (244M parameters)
- Task: Transcription (not translation)
- Language: Hakha Chin (forced, no language token)
- Approach: Fine-tuning with frozen encoder, trainable decoder
- Epochs: 5
- Batch Size: 4 (effective 16 with gradient accumulation)
- Learning Rate: 1e-5
- Optimizer: AdamW
- Mixed Precision: FP16 (on GPU)
- Sample Rate: 16kHz (mono)
- Segmentation: Non-silence detection
- Window: 30-second sliding windows with overlap
- Normalization: Automatic volume adjustment
- Method: Google Translate API (via deep-translator)
- Source: Hakha Chin (CNH) or auto-detect
- Target: English
| Version | Chapters | Segments | Status | Notes |
|---|---|---|---|---|
| V1 | Mark | - | ❌ Abandoned | Severe repetition, undertrained |
| V2 | Mark (16) | 540 | ✅ Working | Good baseline, limited vocabulary |
| V3 | Mark + Matthew | 1,517 | ❌ Failed | Bad alignment, multilingual gibberish |
| V4 | Mark + Matthew (44) | 1,375 | ✅ Current | Proper alignment, production ready |
- Field test with native speakers
- Collect accuracy metrics on real conversations
- Optimize processing speed
- Expand to all 260+ available Bible chapters
- Add data augmentation (speed, pitch, noise)
- Test on diverse audio conditions
- Collect conversational Hakha Chin data
- Add female and diverse speakers
- Implement speaker diarization
- Train dedicated Hakha Chin → English translation model
- Create community crowdsourcing platform
- Audio: Faith Comes By Hearing (Hakha Chin Bible)
- Text: YouVersion Bible (Hakha Chin)
- Books: Gospel of Mark (16 chapters), Gospel of Matthew (28 chapters)
Contributions welcome! Areas of interest:
- Additional training data (conversational Hakha Chin)
- Performance optimizations
- Accuracy improvements
- UI/UX enhancements
- Documentation
This project is for educational and language preservation purposes. Please respect the licenses of:
- OpenAI Whisper (Apache 2.0)
- Bible audio and text sources
- Transformers library (Apache 2.0)
- OpenAI for the Whisper model
- Faith Comes By Hearing for Hakha Chin Bible audio
- YouVersion for Hakha Chin Bible text
- The Hakha Chin community
For questions or collaboration: GitHub Issues
Last Updated: November 5, 2025
Model Version: V4
Status: Production Ready (with known limitations)