Add ONNX Runtime GenAI generation comparison in OnnxDiscrepancyCheck#2487
Add ONNX Runtime GenAI generation comparison in OnnxDiscrepancyCheck#2487xadupre wants to merge 9 commits into
Conversation
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Extends the OnnxDiscrepancyCheck pass with an optional comparison between the reference HuggingFace model's generate() output and ONNX Runtime GenAI's generation output, reporting (and optionally enforcing a threshold on) the length of the longest matching token prefix. Adds unit tests for the new helper and the _compare_generation method.
Changes:
- Adds
_longest_common_token_sequencehelper and a new_compare_generationmethod that runs both transformers and ORT-GenAI generation. - Adds
genai_model_path,generate_prompt,generate_max_new_tokens, andmin_longest_common_tokensconfig parameters, with a failure when the common prefix is below the configured minimum. - Adds
test/passes/onnx/test_discrepancy_check.pycovering the helper and the generation comparison path with mocks.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| olive/passes/onnx/discrepancy_check.py | Adds GenAI-based token-sequence comparison logic, new config params, and failure threshold. |
| test/passes/onnx/test_discrepancy_check.py | Unit tests for the new helper and _compare_generation, using mocks for og and tokenizers. |
|
@copilot fix test_compare_generation_returns_common_prefix_length, opt/hostedtoolcache/Python/3.12.13/x64/lib/python3.12/unittest/mock.py:1396: in patched |
Fixed in c0ea083. I removed the extra |
|
@copilot test/passes/onnx/test_discrepancy_check.py:98: in test_compare_generation_returns_common_prefix_length |
Fixed in 4e82f66. The tests now mock the function-local |
|
@copilot can you switch the latest API for onnxruntime-genai? |
Done in 38aa991. I switched |
Describe your changes
Extended
OnnxDiscrepancyCheckto optionally run ONNX Runtime GenAI generation and compare it with the reference transformers generation using the longest common prefix of token IDs.Also incorporated follow-up fixes from review feedback:
onnxruntime_genaiimport viapatch.dict(sys.modules, ...).withusage in tests to satisfy linting.generate_max_new_tokensdefault to32so GenAI generation comparison runs with a practical default.onnxruntime-genaiAPI by usinggenerator.append_tokens(...)and removing deprecatedparams.input_ids/compute_logits()usage.onnxruntime-genaiis not installed.append_tokens(...)is invoked with the expected prompt token IDs.Checklist before requesting a review
lintrunner -a(Optional) Issue link