[ONNX FE] Support com.microsoft::DynamicQuantizeLSTM#36207
Open
mvafin wants to merge 3 commits into
Open
Conversation
…ain) Implements the DynamicQuantizeLSTM contrib operator from the com.microsoft ONNX domain. The translator dequantizes quantized W and R tensors, normalizes their layout to the standard ONNX LSTM gate ordering, and feeds them into LSTMSequence. Validated by building openvino_onnx_frontend, loading a standalone DynamicQuantizeLSTM model extracted from KittenML/kitten-tts-mini-0.8, and comparing OpenVINO outputs against ONNX Runtime. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Adds ONNX Frontend support for com.microsoft::DynamicQuantizeLSTM (opset 1) by lowering it to ov::op::v5::LSTMSequence, including weight dequantization via ov::decomposition::low_precision_dequantize. The PR also refactors shared recurrent/LSTM utilities, adds regression coverage for both runtime-parameter and constant-weight paths, and updates related documentation.
Changes:
- Add
DynamicQuantizeLSTMtranslator undersrc/frontends/onnx/frontend/src/op/com.microsoft/, including scale/zero-point alignment and explicit rejection of peephole inputP. - Extract shared recurrent helpers (
normalize_tensor_rank,LSTMDimensions, default optional-input builders) intoutils/recurrent.hpp/.cppand reuse them from the existingLSTMtranslator. - Add two new ONNX FE tests + prototxt models covering runtime-input alignment vs constant-weight MarkDequantization patterns; update supported-ops doc and internal agent skill notes.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
src/frontends/onnx/tests/onnx_import_com_microsoft.in.cpp |
Adds two regression tests for DynamicQuantizeLSTM (runtime inputs vs const weights) sharing common inputs/expected outputs. |
src/frontends/onnx/tests/models/com.microsoft/dynamic_quantize_lstm.prototxt |
New prototxt model exercising runtime-provided W/R/scales/zps. |
src/frontends/onnx/tests/models/com.microsoft/dynamic_quantize_lstm_const_weights.prototxt |
New prototxt model with W/R/scales/zps as initializers (const weights path). |
src/frontends/onnx/frontend/src/utils/recurrent.hpp |
Introduces shared recurrent utilities declarations (normalize_tensor_rank, LSTMDimensions, default optional input builders). |
src/frontends/onnx/frontend/src/utils/recurrent.cpp |
Implements the new recurrent utilities and refactors existing default-input construction to use them. |
src/frontends/onnx/frontend/src/op/lstm.cpp |
Switches LSTM translator to use the shared recurrent utilities instead of local helpers/duplicated logic. |
src/frontends/onnx/frontend/src/op/com.microsoft/dynamic_quantize_lstm.cpp |
New translator implementation for com.microsoft::DynamicQuantizeLSTM. |
src/frontends/onnx/docs/supported_ops.md |
Marks DynamicQuantizeLSTM as supported and documents the peephole P limitation. |
.github/agents-prototype/skills/add-fe-op/onnx.md |
Updates internal guidance with lessons learned (LPT dequant pattern + axis alignment + testing expectations). |
Comment on lines
+31
to
+33
| // Runtime dimension values extracted from OV-layout X [batch, seq, input] | ||
| // and R [num_dir, gates*hidden, hidden]. Each member is a rank-1 i32 node. | ||
| struct LSTMDimensions { |
Comment on lines
+61
to
+79
| const auto gate_axis_size = 4 * hidden_size; | ||
| const auto& shape = weights.get_partial_shape(); | ||
| const auto dim1_matches = shape[1].is_static() && shape[1].get_length() == gate_axis_size; | ||
| const auto dim2_matches = shape[2].is_static() && shape[2].get_length() == gate_axis_size; | ||
|
|
||
| CHECK_VALID_NODE(node, | ||
| dim1_matches || dim2_matches, | ||
| "DynamicQuantizeLSTM input '", | ||
| input_name, | ||
| "' must have either axis 1 or axis 2 equal to 4*hidden_size (", | ||
| gate_axis_size, | ||
| "). Got shape: ", | ||
| shape); | ||
|
|
||
| if (!dim1_matches && dim2_matches) { | ||
| return ov::op::util::reorder_axes(weights, {0, 2, 1}); | ||
| } | ||
| return weights; | ||
| } |
Review-driven improvements on top of the DynamicQuantizeLSTM PR: - Fix the test that could never pass. ONNX Runtime's DynamicQuantizeLSTM dynamically quantizes the activations (X and the recurrent hidden state) and runs integer matmuls, whereas this translator only dequantizes the W/R weights and runs a float LSTMSequence. The two differ by the activation-quantization noise (~1.5e-3 here), so the original 1e-6 tolerance failed against the ORT-generated expected values. Relax the tolerance to 0.0055 (matching the sibling DynamicQuantizeMatMul test) and document the approximation with a TODO to model activation quantization (OpenVINO CPU/GPU plugins support dynamic quantization). - Drop the speculative rank-2 (num_directions-omitted) weight handling. The com.microsoft spec defines W/R as rank-3 only; the rank-2 branch was dead code that would also have emitted outputs with an extra num_directions dimension (no squeeze-back, no bidirectional guard, unlike lstm.cpp). Require rank-3 and remove the now-dead original_rank plumbing. - Reject the unsupported peephole input P instead of silently dropping it. - Reduce duplication with the standard LSTM translator: extract normalize_tensor_rank and the default optional-input fabrication (dimension extraction + default bias / sequence_lens / initial state) into shared recurrent utils, used by both lstm.cpp and the dynamic translator. Data-bearing edges (X, W, R, provided initial states) stay explicit in each translator so future activation-quantization insertion is unobstructed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Details:
Tickets:
AI Assistance: