Ultra-lightweight, model-agnostic quantization for PyTorch
Mono Quant is a simple, reliable model quantization package for PyTorch with minimal dependencies. Just torch and numpy, no bloat.
- Model-Agnostic - Works with any PyTorch model: HuggingFace, local, or custom
- Multiple Modes - INT8, INT4, and FP16 quantization
- Flexible Calibration - Dynamic (no data) or static (with calibration data)
- Robust Validation - SQNR metrics, size comparison, and accuracy warnings
- Dual Interface - Python API for automation, CLI for CI/CD
- Build-Phase Only - Quantize during build, deploy lightweight models
pip install mono-quant- Python 3.11 or higher
- PyTorch 2.0 or higher
- NumPy 1.24 or higher
from mono_quant import quantize
# Dynamic INT8 quantization (no calibration data needed)
result = quantize(model, bits=8, dynamic=True)
# Save the quantized model
result.save("model_quantized.pt")
# Check metrics
print(f"Compression: {result.info.compression_ratio:.2f}x")
print(f"SQNR: {result.info.sqnr_db:.2f} dB")# Dynamic quantization
monoquant quantize --model model.pt --bits 8 --dynamic
# With custom output path
monoquant quantize --model model.pt --bits 8 --output model_quantized.ptresult = quantize(model, bits=8, dynamic=True)calibration_data = [torch.randn(1, 3, 224, 224) for _ in range(150)]
result = quantize(
model,
bits=8,
dynamic=False,
calibration_data=calibration_data
)result = quantize(
model,
bits=4,
dynamic=False,
calibration_data=calibration_data,
group_size=128 # Default
)Full documentation available at https://thataverageguy.github.io/mono-quant
Most quantization tools are tied to specific frameworks (HuggingFace, TFLite) or require heavy dependencies. Mono Quant fills the niche of "just quantize the weights, nothing else."
| Aspect | Approach |
|---|---|
| Model Loading | You load the model, we quantize it |
| Dependencies | Only torch and numpy required |
| Use Case | Build-phase (CI/CD, local development) |
| Scope | Quantization only, no runtime or serving |
MIT License - see LICENSE for details.
Contributions welcome! Please see CONTRIBUTING.md for guidelines.