Skip to content

CodeWithBehnam/vayu

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Vayu (وایو)

Python 3.10+ License: MIT Platform: macOS Apple Silicon MLX Version

The fastest Whisper implementation on Apple Silicon.

Vayu (وایو) is the ancient Persian god of wind — the swiftest force in nature. In Zoroastrian mythology, Vayu represents the divine wind that moves faster than any earthly creature. We chose this name because this implementation outperforms even "lightning-fast" alternatives, making Vayu the most fitting name for the fastest Whisper on Apple Silicon.

Acknowledgments

This project builds upon the excellent work of others. We are grateful to:

  • Apple MLX Team - For the MLX framework and the original Whisper MLX implementation with CLI support, output writers, and numerical stability improvements
  • Mustafa Aljadery - For the lightning-fast batched decoding implementation that significantly improves throughput
  • Siddharth Sharma - Co-author of lightning-whisper-mlx
  • OpenAI - For creating the original Whisper model and making it open source

This unified implementation combines the best of both worlds:

  • ml-explore/mlx-examples/whisper - Newer APIs, CLI support, output writers, numerical stability
  • lightning-whisper-mlx - Batched decoding for higher throughput

Features

  • Batched decoding - Process multiple audio segments in parallel for 3-5x faster transcription
  • Multiple output formats - txt, vtt, srt, tsv, json
  • Word-level timestamps - Extract precise word timings
  • Multiple model support - tiny, base, small, medium, large-v3, turbo, distil variants
  • Quantization - 4-bit and 8-bit quantized models for reduced memory usage
  • Simple API - Easy-to-use LightningWhisperMLX wrapper class

Installation

# Clone the repository
git clone <repo-url>
cd vayu

# Install the package
pip install -e .

# Download required assets (mel filters and tokenizer vocabularies)
python -m whisper_mlx.assets.download_assets

Requirements

  • macOS with Apple Silicon (M1/M2/M3)
  • Python 3.10+
  • MLX 0.11+

Quick Start

Simple API

from whisper_mlx import LightningWhisperMLX

# Initialize with batched decoding
whisper = LightningWhisperMLX(model="distil-large-v3", batch_size=12)

# Transcribe
result = whisper.transcribe("audio.mp3")
print(result["text"])

# With options
result = whisper.transcribe(
    "audio.mp3",
    language="en",
    word_timestamps=True,
)

Full API

from whisper_mlx import transcribe

result = transcribe(
    "audio.mp3",
    path_or_hf_repo="mlx-community/whisper-turbo",
    batch_size=6,
    language="en",
    word_timestamps=True,
)

print(result["text"])
for segment in result["segments"]:
    print(f"[{segment['start']:.2f} -> {segment['end']:.2f}] {segment['text']}")

CLI

# Basic transcription
vayu audio.mp3

# With batched decoding (faster)
vayu audio.mp3 --batch-size 12

# Specify model and output format
vayu audio.mp3 --model mlx-community/distil-whisper-large-v3 --output-format srt

# Multiple files
vayu audio1.mp3 audio2.mp3 --output-dir ./transcripts

# With word timestamps
vayu audio.mp3 --word-timestamps True

# Translate to English
vayu audio.mp3 --task translate

Available Models

Model HuggingFace Repo Size Speed
tiny mlx-community/whisper-tiny-mlx 39M Fastest
base mlx-community/whisper-base-mlx 74M Fast
small mlx-community/whisper-small-mlx 244M Medium
medium mlx-community/whisper-medium-mlx 769M Slow
large-v3 mlx-community/whisper-large-v3-mlx 1.5B Slowest
turbo mlx-community/whisper-turbo 809M Fast
distil-large-v3 mlx-community/distil-whisper-large-v3 756M Fast

Quantized Models

For reduced memory usage, use quantized models:

whisper = LightningWhisperMLX(model="distil-large-v3", quant="4bit")

Batch Size Recommendations

Model Recommended batch_size Memory Usage
tiny/base 24-32 Low
small 16-24 Medium
medium 8-12 High
large/turbo 4-8 High
distil-large-v3 12-16 Medium

Higher batch sizes improve throughput but require more memory. Start with the recommended values and adjust based on your hardware.

API Reference

transcribe()

def transcribe(
    audio: Union[str, np.ndarray, mx.array],
    *,
    path_or_hf_repo: str = "mlx-community/whisper-turbo",
    batch_size: int = 1,
    verbose: Optional[bool] = None,
    temperature: Union[float, Tuple[float, ...]] = (0.0, 0.2, 0.4, 0.6, 0.8, 1.0),
    compression_ratio_threshold: Optional[float] = 2.4,
    logprob_threshold: Optional[float] = -1.0,
    no_speech_threshold: Optional[float] = 0.6,
    condition_on_previous_text: bool = True,
    initial_prompt: Optional[str] = None,
    word_timestamps: bool = False,
    **decode_options,
) -> dict

LightningWhisperMLX

class LightningWhisperMLX:
    def __init__(
        self,
        model: str = "distil-large-v3",
        batch_size: int = 12,
        quant: str = None,
    )

    def transcribe(
        self,
        audio_path: str,
        language: str = None,
        task: str = "transcribe",
        verbose: bool = False,
        word_timestamps: bool = False,
        **kwargs,
    ) -> dict

License

MIT License - see LICENSE file for details.

Author

Behnam Ebrahimi - Unified implementation, security improvements, and maintenance

Credits

This project would not be possible without:

Project Author(s) Contribution
mlx-examples/whisper Apple Inc. MLX framework, Whisper port, CLI, output writers
lightning-whisper-mlx Mustafa Aljadery, Siddharth Sharma Batched decoding for 3-5x speedup
Whisper OpenAI Original model architecture and weights

Thank you to all contributors who make open source AI accessible and fast on Apple Silicon.

About

Vayu (وایو) - The fastest Whisper implementation on Apple Silicon

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages