Add vision genai inference path for multi-file VLM evaluation by jiafatom · Pull Request #2488 · microsoft/Olive

jiafatom · 2026-06-01T17:44:14Z

Describe your changes

Adds _inference_vision_genai method to OnnxEvaluator that enables olive run to evaluate multi-file ONNX vision-language models (e.g., Qwen3-VL) using onnxruntime-genai.

Problem

Vision-language models exported via onnxruntime-genai produce multiple ONNX files (vision.onnx, text.onnx, embedding.onnx) with a genai_config.json. The existing _inference_vision method only supports single-file ONNX models with classification-style forward pass. This prevented using olive run --config for evaluation of autoregressive VLMs.

Solution

Auto-detect genai vision models by checking if genai_config.json contains a vision field
Route to new _inference_vision_genai method which uses og.Model, multimodal_processor, and og.Generator for autoregressive text generation
Follows the same pattern as the existing speech genai inference (_inference_text_genai for Whisper, _inference_text_genai_streaming for Nemotron)
Falls back to existing _inference_vision for single-file ONNX VQA models

Usage

{
    "input_model": {
        "type": "OnnxModel",
        "model_path": "path/to/models",
        "onnx_file_name": "text.onnx"
    }
}

The evaluator will auto-detect genai_config.json in the model directory and use the genai path.

Pull request overview

Adds a genai-based vision inference path to OnnxEvaluator so that olive run can evaluate multi-file ONNX vision-language models (e.g., Qwen3-VL) that ship with a genai_config.json. The dispatcher in _evaluate_onnx_accuracy now auto-detects whether the model is a genai VLM (by inspecting genai_config.json for a vision field) and routes to a new _inference_vision_genai method that drives generation through onnxruntime_genai's multimodal processor, generator, and tokenizer.

Changes:

Extend the vision-metric branch of _evaluate_onnx_accuracy to auto-detect genai VLMs via genai_config.json and route accordingly.
Implement _inference_vision_genai, which builds an og.Model, formats chat-style multimodal prompts, runs autoregressive generation per sample, and returns decoded predictions plus targets.
Preserve existing behavior for single-file VQA ONNX models by falling back to _inference_vision.

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

Adds _inference_vision_genai method to OnnxEvaluator that uses onnxruntime-genai for vision-language models (e.g., Qwen3-VL) with multi-file ONNX architectures (vision.onnx, text.onnx, embedding.onnx). The method is auto-detected when genai_config.json exists and contains a 'vision' field in the model config. This mirrors the existing auto-detection pattern used for speech models (whisper, nemotron_speech). For single-file ONNX VQA models, the existing _inference_vision path (classification-style single forward pass) is still used. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Previously, when a component (e.g., pre_process_data) specified a task type, only that same component's override was applied from the task map. This meant the vision-vqa dataloader override (vision_vqa_dataloader with custom collate_fn for PIL images) was never applied since it was a different component than the one specifying the task. Now, when any component specifies a task type, ALL component overrides from the task_type_components_map are applied. This ensures the custom dataloader with PIL-safe collation is used for vision tasks. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Simplify dispatch logic: use single boolean flag instead of duplicated fallback branches - Honor execution_providers parameter: map user-specified EPs to og.Config providers instead of only checking device - Use TemporaryDirectory instead of per-file NamedTemporaryFile to avoid I/O overhead and file leak risk - Add comment clarifying pred/target alignment when image is None Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

onnxruntime-genai uses CPU by default when no provider is appended. CPUExecutionProvider is not a recognized genai provider name, so skip it rather than trying to map it. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

onnxruntime-genai uses short provider names (e.g., 'cuda') not ORT-style names ('CUDAExecutionProvider'). Match the pattern used by the existing speech genai methods: only check device field for provider selection. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The genai_config.json may specify max_length equal to the full context window (e.g., 262144) which causes near-infinite generation for VQA tasks where answers are typically 1-10 tokens. Cap at 128 tokens. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

max_length in genai is total sequence length (input + output). Vision inputs include image tokens which can be 200+ tokens, so 128 was too small. Use 4096 which accommodates input tokens plus short VQA answers while still preventing runaway generation from 262K context windows. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Remove unused params (metric, execution_providers) from _inference_vision_genai signature - Remove unused genai_config variable (was loaded but not used) - Document that device drives GPU/CPU selection in genai - Rename local var to genai_cfg to avoid shadowing - Run ruff format Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Allow passing a system_prompt parameter in pre_process config to guide model responses (e.g., 'reply with only the option number'). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Add options_col param to format multiple-choice options into the question - Extract leading number from model responses (e.g. '1. D' -> '1') - Add debug logging to vision_eval_debug.jsonl in model dir Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Extract _load_genai_config helper to deduplicate config detection - Remove hardcoded number extraction (task-specific, not generic) - Remove debug logging (was dev instrumentation) - Use 'from e' instead of 'from None' in ImportError - Add missing docstring params for options_col and system_prompt Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

When options_col is specified in pre_process config, set extract_number=True in the input dict. The evaluator uses this flag to extract the leading number from model responses (e.g. '1. D' -> '1'), which is needed for correct exact_match scoring on multiple-choice benchmarks like AI2D. This is not applied for OCR/ChartQA tasks where numeric predictions are valid. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

… tests - Use 'vision' in dict check instead of bool() to handle empty vision objects - Add TestOnnxEvaluatorGenaiVisionDetection test class with 8 tests covering: - _load_genai_config helper (present/missing) - Vision detection logic (with vision, empty vision, no vision, no config) - Dispatch routing (genai vs standard vision path)

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

+    def _load_genai_config(model: ONNXModelHandler) -> Optional[dict]:
+        """Load genai_config.json from the model directory, or return None if not found."""
+        genai_config_path = Path(model.model_path).parent / "genai_config.json"
+        if not genai_config_path.exists():
+            return None
+        import json
+
+        with genai_config_path.open() as f:
+            return json.load(f)


+        import json
+        import re
+        import tempfile
+
+        from PIL import Image
+
+        model_dir = str(Path(model.model_path).parent)


+                    # Ensure PIL Image
+                    if not isinstance(pil_image, Image.Image):
+                        pil_image = Image.open(pil_image).convert("RGB")
+


…andle leak - Wrap genai_config.json parsing in try/except JSONDecodeError with filepath in message - Guard PIL import with ImportError and helpful install message - Use context manager for Image.open() to close file handle promptly

Copilot AI review requested due to automatic review settings June 1, 2026 17:44

Copilot started reviewing on behalf of jiafatom June 1, 2026 17:44 View session

Copilot AI reviewed Jun 1, 2026

View reviewed changes

Comment thread olive/evaluator/olive_evaluator.py

Comment thread olive/evaluator/olive_evaluator.py Outdated

Comment thread olive/evaluator/olive_evaluator.py Outdated

Comment thread olive/evaluator/olive_evaluator.py

github-advanced-security AI found potential problems Jun 1, 2026

View reviewed changes

Comment thread olive/evaluator/olive_evaluator.py Fixed

Comment thread olive/evaluator/olive_evaluator.py Fixed

Comment thread olive/evaluator/olive_evaluator.py Fixed

Comment thread olive/evaluator/olive_evaluator.py Fixed

jiafatom requested a review from Copilot June 1, 2026 18:30

Copilot started reviewing on behalf of jiafatom June 1, 2026 18:30 View session

Copilot AI reviewed Jun 1, 2026

View reviewed changes

Comment thread olive/evaluator/olive_evaluator.py

Comment thread olive/evaluator/olive_evaluator.py

jiafatom force-pushed the jiafa/add-vision-genai-evaluator branch from 01d07cb to c032caf Compare June 1, 2026 19:48

github-advanced-security AI found potential problems Jun 1, 2026

View reviewed changes

Comment thread olive/evaluator/olive_evaluator.py Fixed

github-advanced-security AI found potential problems Jun 1, 2026

View reviewed changes

Comment thread olive/evaluator/olive_evaluator.py Fixed

Comment thread olive/evaluator/olive_evaluator.py Fixed

jiafatom requested a review from Copilot June 1, 2026 20:50

Copilot started reviewing on behalf of jiafatom June 1, 2026 20:50 View session

Copilot AI reviewed Jun 1, 2026

View reviewed changes

Comment thread olive/evaluator/olive_evaluator.py Outdated

Comment thread olive/evaluator/olive_evaluator.py Outdated

Comment thread olive/evaluator/olive_evaluator.py

Comment thread olive/evaluator/olive_evaluator.py Outdated

jiafatom requested a review from Copilot June 2, 2026 17:49

Copilot started reviewing on behalf of jiafatom June 2, 2026 17:49 View session

Copilot AI reviewed Jun 2, 2026

View reviewed changes

Comment thread olive/evaluator/olive_evaluator.py

Comment thread olive/evaluator/olive_evaluator.py

jiafatom and others added 14 commits June 2, 2026 18:04

Fix lint: remove unused import, unused loop var, use .values()

398d655

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Add system_prompt support for vision VQA evaluation

23fa91b

Allow passing a system_prompt parameter in pre_process config to guide model responses (e.g., 'reply with only the option number'). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Re-trigger CI: flaky test_mnb_to_qdq failure unrelated to PR changes

f3524f6

jiafatom force-pushed the jiafa/add-vision-genai-evaluator branch from 20c01d2 to d9db22c Compare June 2, 2026 18:05

github-advanced-security AI found potential problems Jun 2, 2026

View reviewed changes

jiafatom force-pushed the jiafa/add-vision-genai-evaluator branch from d9db22c to fa7a239 Compare June 2, 2026 18:23

jiafatom requested a review from Copilot June 2, 2026 18:25

Copilot started reviewing on behalf of jiafatom June 2, 2026 18:25 View session

Copilot AI reviewed Jun 2, 2026

View reviewed changes

jiafatom added 3 commits June 2, 2026 18:34

Fix formatting: use parenthesized context managers

9e6156e

Fix formatting: collapse ImportError raise to single line

d1b55a7

xiaoyu-work approved these changes Jun 2, 2026

View reviewed changes

xiaoyu-work merged commit 48909b6 into main Jun 2, 2026
12 checks passed

xiaoyu-work deleted the jiafa/add-vision-genai-evaluator branch June 2, 2026 20:31

Conversation

jiafatom commented Jun 1, 2026

Describe your changes

Problem

Solution

Usage

Related

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants