Skip to content

[Feature Request] LoRa adapter support for onnxruntime-web #26726

@fosple

Description

@fosple

Describe the feature request

Add support for LoRa adapters in onnxruntime-web for inference. So be able to load and use .onnx_adapter files.

See: https://onnxruntime.ai/docs/genai/tutorials/finetune.html

Is this planned?

Describe scenario use case

1. Dynamic Model Switching in the Browser

Allow real-time switching between different fine-tuned behaviors: sentiment analysis, instruction tuning, role personas, or domain specializations by applying different LoRa adapters to a single base ONNX model. This avoids reloading or duplicating the entire model graph.

2. Reduced Memory Footprint for Multi-Model Applications

Loading multiple fine-tuned ONNX models currently requires storing several large model files in memory.
With LoRa adapter support, only one base model needs to stay loaded, while small LoRa adapters are applied at runtime. This significantly reduces RAM usage in:

  • Browser environments
  • Mobile or low-memory devices
  • Multi-model applications
  • Tooling that frequently swaps between fine-tuned variants

This solves the problem of needing multiple full ONNX models to run different fine-tune behaviors.

3. Faster Loading and Improved User Experience

LoRa adapters are small, enabling near-instant switching between fine-tuned behaviors without downloading or initializing an entire model again.
This improves UX in:

  • Web-based LLM applications
  • Interactive tools
  • Apps where tasks change frequently (e.g., summarize → classify → generate)

This results in faster startup, reduced bandwidth use, and smoother runtime transitions.

4. Lower Bandwidth for Edge Devices

Users on limited networks or mobile connections benefit greatly from downloading small LoRa adapters instead of full multi-GB ONNX models.
Supporting LoRa adapters in onnxruntime-web:

  • Cuts bandwidth consumption
  • Reduces CDN/storage costs
  • Enables on-device personalization without heavy downloads
  • Makes large LLMs feasible in constrained environments

This is especially valuable for browser-based AI on edge hardware or mobile web apps.

Metadata

Metadata

Assignees

No one assigned

    Labels

    feature requestrequest for unsupported feature or enhancementplatform:mobileissues related to ONNX Runtime mobile; typically submitted using templateplatform:webissues related to ONNX Runtime web; typically submitted using template

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions