[Feature Request] LoRa adapter support for onnxruntime-web

 # Describe the feature request

Add support for LoRa adapters in onnxruntime-web for inference. So be able to load and use .onnx_adapter files.

See: https://onnxruntime.ai/docs/genai/tutorials/finetune.html

Is this planned?

# Describe scenario use case

## 1. Dynamic Model Switching in the Browser
Allow real-time switching between different fine-tuned behaviors: sentiment analysis, instruction tuning, role personas, or domain specializations by applying different LoRa adapters to a single base ONNX model. This avoids reloading or duplicating the entire model graph. 

## 2. Reduced Memory Footprint for Multi-Model Applications
Loading multiple fine-tuned ONNX models currently requires storing several large model files in memory.  
With LoRa adapter support, only **one base model** needs to stay loaded, while small LoRa adapters are applied at runtime. This significantly reduces RAM usage in:

- Browser environments  
- Mobile or low-memory devices  
- Multi-model applications  
- Tooling that frequently swaps between fine-tuned variants  

This solves the problem of needing multiple full ONNX models to run different fine-tune behaviors.

## 3. Faster Loading and Improved User Experience
LoRa adapters are small, enabling near-instant switching between fine-tuned behaviors without downloading or initializing an entire model again.  
This improves UX in:

- Web-based LLM applications  
- Interactive tools  
- Apps where tasks change frequently (e.g., summarize → classify → generate)  

This results in faster startup, reduced bandwidth use, and smoother runtime transitions.

## 4. Lower Bandwidth for Edge Devices
Users on limited networks or mobile connections benefit greatly from downloading small LoRa adapters instead of full multi-GB ONNX models.  
Supporting LoRa adapters in onnxruntime-web:

- Cuts bandwidth consumption  
- Reduces CDN/storage costs  
- Enables on-device personalization without heavy downloads  
- Makes large LLMs feasible in constrained environments  

This is especially valuable for browser-based AI on edge hardware or mobile web apps.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature Request] LoRa adapter support for onnxruntime-web #26726

Describe the feature request

Describe scenario use case

1. Dynamic Model Switching in the Browser

2. Reduced Memory Footprint for Multi-Model Applications

3. Faster Loading and Improved User Experience

4. Lower Bandwidth for Edge Devices

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature Request] LoRa adapter support for onnxruntime-web #26726

Description

Describe the feature request

Describe scenario use case

1. Dynamic Model Switching in the Browser

2. Reduced Memory Footprint for Multi-Model Applications

3. Faster Loading and Improved User Experience

4. Lower Bandwidth for Edge Devices

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions