ONNX adapter created by "olive convert-adapters" command cannot work with ONNX model created by "olive auto-opt"

**Describe the bug**
I have a Hugging Face model "deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B" and a PEFT adapter. I use `auto-opt` command and both HF model & PEFT adapter as inputs to generate the ONNX model. I use `convert-adapters` command and PEFT adapter as input to generate the ONNX adapter file. However, the ONNX model and the ONNX adapter cannot work. The runtime error is "RuntimeError: Invalid input name: model.layers.12.self_attn.v_proj.lora_A.weight"

**To Reproduce**
generate a PEFT adapter
```py
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import LoraConfig, get_peft_model

model_name = "deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B"
model = AutoModelForCausalLM.from_pretrained(model_name)

# Create a LoRA config
lora_config = LoraConfig(
    r=8,
    lora_alpha=16,
    target_modules=["q_proj","k_proj","v_proj","o_proj"],
    lora_dropout=0.0,
)

# Create a PEFT model wrapper
peft_model = get_peft_model(model, lora_config)

# Optionally train the model. But this won't impact the repro of the bug

# Save the LoRA adapter
peft_model.save_pretrained("empty_lora")
```

generate the ONNX model

Please note, both the HF model name and the PEFT adapter are inputs. `auto-opt` internally uses ModelBuilder and ExtractAdapter pass. Therefore, `auto-opt` can generate the ONNX model which has adapter slots and an ONNX adapter file. We use the ONNX model only for the repro.

```
olive auto-opt
    --model_name_or_path "deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B" 
    --adapter_path empty_lora 
    --device cpu 
    --provider CPUExecutionProvider 
    --use_model_builder 
    --output_path basemodel-with-slots 
    --log_level 0
```

generate ONNX adapter file

```
olive convert-adapters 
    --adapter_path empty_lora 
    --output_path convert_adapter_result 
    --log_level 0
```

inference

I mostly leverages the inference code from [olive example](https://github.com/microsoft/Olive/blob/rel-0.9.2/examples/getting_started/olive-deepseek-finetune.ipynb). Paste the same code below

```python
import onnxruntime_genai as og
import time

model_folder = "basemodel-with-slots" #olive auto-opt generated
#adapter_path = "basemodel-with-slots/adapter_weights.onnx_adapter" #olive auto-opt generated, inference OK
adapter_path = "convert_adapter_result.onnx_adapter" #olive convert-adapters generated, cannot inference

# Load the base model and tokenizer
model = og.Model(model_folder)
print(dir(model))
adapters = og.Adapters(model) #Adapter code
adapters.load(adapter_path, "en_medical_reasoning") #Adapter code
tokenizer = og.Tokenizer(model)
tokenizer_stream = tokenizer.create_stream()

prompt_template = """
Below is an instruction that describes a task, paired with an input that provides further context. 
Write a response that appropriately completes the request. 
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.

### Instruction:
You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning. 
Please answer the following medical question. 

### Question:
{}

### Response:
<think>
"""

question = """
        A 54-year-old construction worker with a long history of smoking presents with swelling in his upper extremity and face, along with 
        dilated veins in this region. After conducting a CT scan and venogram of the neck, what is the most likely diagnosis for the cause of these symptoms?
"""
prompt = prompt_template.format(question, "")

# Encode the prompt using the tokenizer
input_tokens = tokenizer.encode(prompt)

# Create params and generator
params = og.GeneratorParams(model)
generator = og.Generator(model, params)
generator.set_active_adapter(adapters, "en_medical_reasoning") #Adapter code

# Append input tokens to the generator
generator.append_tokens(input_tokens)

print("")
print("Output: ", end="", flush=True)

token_times = []

# Stream the output
while True:
    start_time = time.time()
    if generator.is_done():
        break
    generator.generate_next_token()
    end_time = time.time()
    
    # Record the time for this token generation
    token_time = end_time - start_time
    token_times.append(token_time)

    new_token = generator.get_next_tokens()[0]
    print(tokenizer_stream.decode(new_token), end="", flush=True)

print()

# Calculate and display timing statistics
if token_times:
    total_tokens = len(token_times)
    avg_time = sum(token_times) / total_tokens
    
    print(f"Total tokens generated: {total_tokens}")
    print(f"Average time per token: {avg_time:.4f} seconds")
    print(f"Tokens per second: {total_tokens / sum(token_times):.2f}")

del generator
```

**Actual behavior**

- The ONNX adapter file generated by `convert-adapter` cannot work with the ONNX model generated by `auto-opt`. By inspecting the ONNX model, I think the root cause is the adapter input name in the model and in the adapter file don't match.
- The ONNX adapter file and ONNX model both generated by `auto-opt` can work. But this is not what the issue complains about.

**Expected behavior**
The ONNX adapter file generated by `convert-adapter` should work with the ONNX model generated by `auto-opt`. 
If this issue is fixed, then I just need to create ONNX model once with `auto-opt` command. Every time I do a new finetuning, I just need to convert the PEFT adapter to ONNX adapter without generating the ONNX format model again.

**Other information**
 - OS: Windows
 - Olive version: 0.10.1
 - ONNXRuntime package and version: onnxruntime 1.23.2, onnxruntime_genai 0.10.0
 - Transformers package version: [e.g. transformers 4.57.1]



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ONNX adapter created by "olive convert-adapters" command cannot work with ONNX model created by "olive auto-opt" #2277

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

ONNX adapter created by "olive convert-adapters" command cannot work with ONNX model created by "olive auto-opt" #2277

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions