Skip to content

Add vision genai inference path for multi-file VLM evaluation#2488

Merged
xiaoyu-work merged 18 commits into
mainfrom
jiafa/add-vision-genai-evaluator
Jun 2, 2026
Merged

Add vision genai inference path for multi-file VLM evaluation#2488
xiaoyu-work merged 18 commits into
mainfrom
jiafa/add-vision-genai-evaluator

Conversation

@jiafatom
Copy link
Copy Markdown
Contributor

@jiafatom jiafatom commented Jun 1, 2026

Describe your changes

Adds _inference_vision_genai method to OnnxEvaluator that enables olive run to evaluate multi-file ONNX vision-language models (e.g., Qwen3-VL) using onnxruntime-genai.

Problem

Vision-language models exported via onnxruntime-genai produce multiple ONNX files (vision.onnx, text.onnx, embedding.onnx) with a genai_config.json. The existing _inference_vision method only supports single-file ONNX models with classification-style forward pass. This prevented using olive run --config for evaluation of autoregressive VLMs.

Solution

  • Auto-detect genai vision models by checking if genai_config.json contains a vision field
  • Route to new _inference_vision_genai method which uses og.Model, multimodal_processor, and og.Generator for autoregressive text generation
  • Follows the same pattern as the existing speech genai inference (_inference_text_genai for Whisper, _inference_text_genai_streaming for Nemotron)
  • Falls back to existing _inference_vision for single-file ONNX VQA models

Usage

{
    "input_model": {
        "type": "OnnxModel",
        "model_path": "path/to/models",
        "onnx_file_name": "text.onnx"
    }
}

The evaluator will auto-detect genai_config.json in the model directory and use the genai path.

Related

Copilot AI review requested due to automatic review settings June 1, 2026 17:44
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a genai-based vision inference path to OnnxEvaluator so that olive run can evaluate multi-file ONNX vision-language models (e.g., Qwen3-VL) that ship with a genai_config.json. The dispatcher in _evaluate_onnx_accuracy now auto-detects whether the model is a genai VLM (by inspecting genai_config.json for a vision field) and routes to a new _inference_vision_genai method that drives generation through onnxruntime_genai's multimodal processor, generator, and tokenizer.

Changes:

  • Extend the vision-metric branch of _evaluate_onnx_accuracy to auto-detect genai VLMs via genai_config.json and route accordingly.
  • Implement _inference_vision_genai, which builds an og.Model, formats chat-style multimodal prompts, runs autoregressive generation per sample, and returns decoded predictions plus targets.
  • Preserve existing behavior for single-file VQA ONNX models by falling back to _inference_vision.

Comment thread olive/evaluator/olive_evaluator.py
Comment thread olive/evaluator/olive_evaluator.py Outdated
Comment thread olive/evaluator/olive_evaluator.py Outdated
Comment thread olive/evaluator/olive_evaluator.py
Comment thread olive/evaluator/olive_evaluator.py Fixed
Comment thread olive/evaluator/olive_evaluator.py Fixed
Comment thread olive/evaluator/olive_evaluator.py Fixed
Comment thread olive/evaluator/olive_evaluator.py Fixed
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

Comment thread olive/evaluator/olive_evaluator.py
Comment thread olive/evaluator/olive_evaluator.py
@jiafatom jiafatom force-pushed the jiafa/add-vision-genai-evaluator branch from 01d07cb to c032caf Compare June 1, 2026 19:48
Comment thread olive/evaluator/olive_evaluator.py Fixed
Comment thread olive/evaluator/olive_evaluator.py Fixed
Comment thread olive/evaluator/olive_evaluator.py Fixed
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

Comment thread olive/evaluator/olive_evaluator.py Outdated
Comment thread olive/evaluator/olive_evaluator.py Outdated
Comment thread olive/evaluator/olive_evaluator.py
Comment thread olive/evaluator/olive_evaluator.py Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

Comment thread olive/evaluator/olive_evaluator.py
Comment thread olive/evaluator/olive_evaluator.py
jiafatom and others added 14 commits June 2, 2026 18:04
Adds _inference_vision_genai method to OnnxEvaluator that uses
onnxruntime-genai for vision-language models (e.g., Qwen3-VL) with
multi-file ONNX architectures (vision.onnx, text.onnx, embedding.onnx).

The method is auto-detected when genai_config.json exists and contains
a 'vision' field in the model config. This mirrors the existing
auto-detection pattern used for speech models (whisper, nemotron_speech).

For single-file ONNX VQA models, the existing _inference_vision path
(classification-style single forward pass) is still used.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Previously, when a component (e.g., pre_process_data) specified a task
type, only that same component's override was applied from the task map.
This meant the vision-vqa dataloader override (vision_vqa_dataloader with
custom collate_fn for PIL images) was never applied since it was a
different component than the one specifying the task.

Now, when any component specifies a task type, ALL component overrides
from the task_type_components_map are applied. This ensures the custom
dataloader with PIL-safe collation is used for vision tasks.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Simplify dispatch logic: use single boolean flag instead of
  duplicated fallback branches
- Honor execution_providers parameter: map user-specified EPs to
  og.Config providers instead of only checking device
- Use TemporaryDirectory instead of per-file NamedTemporaryFile to
  avoid I/O overhead and file leak risk
- Add comment clarifying pred/target alignment when image is None

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
onnxruntime-genai uses CPU by default when no provider is appended.
CPUExecutionProvider is not a recognized genai provider name, so skip
it rather than trying to map it.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
onnxruntime-genai uses short provider names (e.g., 'cuda') not ORT-style
names ('CUDAExecutionProvider'). Match the pattern used by the existing
speech genai methods: only check device field for provider selection.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The genai_config.json may specify max_length equal to the full context
window (e.g., 262144) which causes near-infinite generation for VQA
tasks where answers are typically 1-10 tokens. Cap at 128 tokens.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
max_length in genai is total sequence length (input + output). Vision
inputs include image tokens which can be 200+ tokens, so 128 was too
small. Use 4096 which accommodates input tokens plus short VQA answers
while still preventing runaway generation from 262K context windows.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Remove unused params (metric, execution_providers) from
  _inference_vision_genai signature
- Remove unused genai_config variable (was loaded but not used)
- Document that device drives GPU/CPU selection in genai
- Rename local var to genai_cfg to avoid shadowing
- Run ruff format

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Allow passing a system_prompt parameter in pre_process config to guide
model responses (e.g., 'reply with only the option number').

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add options_col param to format multiple-choice options into the question
- Extract leading number from model responses (e.g. '1. D' -> '1')
- Add debug logging to vision_eval_debug.jsonl in model dir

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Extract _load_genai_config helper to deduplicate config detection
- Remove hardcoded number extraction (task-specific, not generic)
- Remove debug logging (was dev instrumentation)
- Use 'from e' instead of 'from None' in ImportError
- Add missing docstring params for options_col and system_prompt

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When options_col is specified in pre_process config, set extract_number=True
in the input dict. The evaluator uses this flag to extract the leading number
from model responses (e.g. '1. D' -> '1'), which is needed for correct
exact_match scoring on multiple-choice benchmarks like AI2D.

This is not applied for OCR/ChartQA tasks where numeric predictions are valid.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@jiafatom jiafatom force-pushed the jiafa/add-vision-genai-evaluator branch from 20c01d2 to d9db22c Compare June 2, 2026 18:05
Comment thread test/evaluator/test_olive_evaluator.py Fixed
Comment thread test/evaluator/test_olive_evaluator.py Fixed
Comment thread test/evaluator/test_olive_evaluator.py Fixed
Comment thread test/evaluator/test_olive_evaluator.py Fixed
Comment thread test/evaluator/test_olive_evaluator.py Fixed
Comment thread test/evaluator/test_olive_evaluator.py Fixed
Comment thread test/evaluator/test_olive_evaluator.py Fixed
Comment thread test/evaluator/test_olive_evaluator.py Fixed
… tests

- Use 'vision' in dict check instead of bool() to handle empty vision objects
- Add TestOnnxEvaluatorGenaiVisionDetection test class with 8 tests covering:
  - _load_genai_config helper (present/missing)
  - Vision detection logic (with vision, empty vision, no vision, no config)
  - Dispatch routing (genai vs standard vision path)
@jiafatom jiafatom force-pushed the jiafa/add-vision-genai-evaluator branch from d9db22c to fa7a239 Compare June 2, 2026 18:23
@jiafatom jiafatom requested a review from Copilot June 2, 2026 18:25
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

Comment thread olive/evaluator/olive_evaluator.py Outdated
Comment on lines +586 to +594
def _load_genai_config(model: ONNXModelHandler) -> Optional[dict]:
"""Load genai_config.json from the model directory, or return None if not found."""
genai_config_path = Path(model.model_path).parent / "genai_config.json"
if not genai_config_path.exists():
return None
import json

with genai_config_path.open() as f:
return json.load(f)
Comment on lines +836 to +842
import json
import re
import tempfile

from PIL import Image

model_dir = str(Path(model.model_path).parent)
Comment on lines +887 to +890
# Ensure PIL Image
if not isinstance(pil_image, Image.Image):
pil_image = Image.open(pil_image).convert("RGB")

jiafatom added 3 commits June 2, 2026 18:34
…andle leak

- Wrap genai_config.json parsing in try/except JSONDecodeError with filepath in message
- Guard PIL import with ImportError and helpful install message
- Use context manager for Image.open() to close file handle promptly
@xiaoyu-work xiaoyu-work merged commit 48909b6 into main Jun 2, 2026
12 checks passed
@xiaoyu-work xiaoyu-work deleted the jiafa/add-vision-genai-evaluator branch June 2, 2026 20:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants