-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Description
Describe the feature request
Add support for LoRa adapters in onnxruntime-web for inference. So be able to load and use .onnx_adapter files.
See: https://onnxruntime.ai/docs/genai/tutorials/finetune.html
Is this planned?
Describe scenario use case
1. Dynamic Model Switching in the Browser
Allow real-time switching between different fine-tuned behaviors: sentiment analysis, instruction tuning, role personas, or domain specializations by applying different LoRa adapters to a single base ONNX model. This avoids reloading or duplicating the entire model graph.
2. Reduced Memory Footprint for Multi-Model Applications
Loading multiple fine-tuned ONNX models currently requires storing several large model files in memory.
With LoRa adapter support, only one base model needs to stay loaded, while small LoRa adapters are applied at runtime. This significantly reduces RAM usage in:
- Browser environments
- Mobile or low-memory devices
- Multi-model applications
- Tooling that frequently swaps between fine-tuned variants
This solves the problem of needing multiple full ONNX models to run different fine-tune behaviors.
3. Faster Loading and Improved User Experience
LoRa adapters are small, enabling near-instant switching between fine-tuned behaviors without downloading or initializing an entire model again.
This improves UX in:
- Web-based LLM applications
- Interactive tools
- Apps where tasks change frequently (e.g., summarize → classify → generate)
This results in faster startup, reduced bandwidth use, and smoother runtime transitions.
4. Lower Bandwidth for Edge Devices
Users on limited networks or mobile connections benefit greatly from downloading small LoRa adapters instead of full multi-GB ONNX models.
Supporting LoRa adapters in onnxruntime-web:
- Cuts bandwidth consumption
- Reduces CDN/storage costs
- Enables on-device personalization without heavy downloads
- Makes large LLMs feasible in constrained environments
This is especially valuable for browser-based AI on edge hardware or mobile web apps.