Skip to content

[ONNX FE] Support com.microsoft::DynamicQuantizeLSTM#36207

Open
mvafin wants to merge 3 commits into
openvinotoolkit:masterfrom
mvafin:mvafin/onnx/dynamic-quantize-lstm-improvements
Open

[ONNX FE] Support com.microsoft::DynamicQuantizeLSTM#36207
mvafin wants to merge 3 commits into
openvinotoolkit:masterfrom
mvafin:mvafin/onnx/dynamic-quantize-lstm-improvements

Conversation

@mvafin
Copy link
Copy Markdown
Contributor

@mvafin mvafin commented Jun 3, 2026

Details:

  • Adds com.microsoft::DynamicQuantizeLSTM (opset 1) translator to the ONNX Frontend
  • Dequantizes W/R using ov::decomposition::low_precision_dequantize so MarkDequantization can fire and the CPU/GPU quantized kernel runs when weights are graph constants
  • Rejects unsupported peephole input P with a clear error; adds a TODO to support it via LSTMCell unrolling
  • Extracts shared recurrent utilities (normalize_tensor_rank, LSTMDimensions, default optional-input helpers) into utils/recurrent.hpp/.cpp, used by both the new translator and lstm.cpp
  • Adds two tests: runtime-input variant (Unsqueeze alignment path) and graph-constant variant (MarkDequantization path)
  • Updates add-fe-op/onnx.md agent skill with lessons from this implementation

Tickets:

AI Assistance:

mlukasze and others added 2 commits June 3, 2026 12:07
…ain)

Implements the DynamicQuantizeLSTM contrib operator from the com.microsoft ONNX domain. The translator dequantizes quantized W and R tensors, normalizes their layout to the standard ONNX LSTM gate ordering, and feeds them into LSTMSequence.

Validated by building openvino_onnx_frontend, loading a standalone DynamicQuantizeLSTM model extracted from KittenML/kitten-tts-mini-0.8, and comparing OpenVINO outputs against ONNX Runtime.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@mvafin mvafin requested a review from Copilot June 3, 2026 11:39
@mvafin mvafin requested review from a team as code owners June 3, 2026 11:39
@mvafin mvafin requested review from tsavina and removed request for a team June 3, 2026 11:39
@github-actions github-actions Bot added category: CI OpenVINO public CI category: docs OpenVINO documentation category: ONNX FE OpenVINO ONNX FrontEnd labels Jun 3, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds ONNX Frontend support for com.microsoft::DynamicQuantizeLSTM (opset 1) by lowering it to ov::op::v5::LSTMSequence, including weight dequantization via ov::decomposition::low_precision_dequantize. The PR also refactors shared recurrent/LSTM utilities, adds regression coverage for both runtime-parameter and constant-weight paths, and updates related documentation.

Changes:

  • Add DynamicQuantizeLSTM translator under src/frontends/onnx/frontend/src/op/com.microsoft/, including scale/zero-point alignment and explicit rejection of peephole input P.
  • Extract shared recurrent helpers (normalize_tensor_rank, LSTMDimensions, default optional-input builders) into utils/recurrent.hpp/.cpp and reuse them from the existing LSTM translator.
  • Add two new ONNX FE tests + prototxt models covering runtime-input alignment vs constant-weight MarkDequantization patterns; update supported-ops doc and internal agent skill notes.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/frontends/onnx/tests/onnx_import_com_microsoft.in.cpp Adds two regression tests for DynamicQuantizeLSTM (runtime inputs vs const weights) sharing common inputs/expected outputs.
src/frontends/onnx/tests/models/com.microsoft/dynamic_quantize_lstm.prototxt New prototxt model exercising runtime-provided W/R/scales/zps.
src/frontends/onnx/tests/models/com.microsoft/dynamic_quantize_lstm_const_weights.prototxt New prototxt model with W/R/scales/zps as initializers (const weights path).
src/frontends/onnx/frontend/src/utils/recurrent.hpp Introduces shared recurrent utilities declarations (normalize_tensor_rank, LSTMDimensions, default optional input builders).
src/frontends/onnx/frontend/src/utils/recurrent.cpp Implements the new recurrent utilities and refactors existing default-input construction to use them.
src/frontends/onnx/frontend/src/op/lstm.cpp Switches LSTM translator to use the shared recurrent utilities instead of local helpers/duplicated logic.
src/frontends/onnx/frontend/src/op/com.microsoft/dynamic_quantize_lstm.cpp New translator implementation for com.microsoft::DynamicQuantizeLSTM.
src/frontends/onnx/docs/supported_ops.md Marks DynamicQuantizeLSTM as supported and documents the peephole P limitation.
.github/agents-prototype/skills/add-fe-op/onnx.md Updates internal guidance with lessons learned (LPT dequant pattern + axis alignment + testing expectations).

Comment on lines +31 to +33
// Runtime dimension values extracted from OV-layout X [batch, seq, input]
// and R [num_dir, gates*hidden, hidden]. Each member is a rank-1 i32 node.
struct LSTMDimensions {
Comment on lines +61 to +79
const auto gate_axis_size = 4 * hidden_size;
const auto& shape = weights.get_partial_shape();
const auto dim1_matches = shape[1].is_static() && shape[1].get_length() == gate_axis_size;
const auto dim2_matches = shape[2].is_static() && shape[2].get_length() == gate_axis_size;

CHECK_VALID_NODE(node,
dim1_matches || dim2_matches,
"DynamicQuantizeLSTM input '",
input_name,
"' must have either axis 1 or axis 2 equal to 4*hidden_size (",
gate_axis_size,
"). Got shape: ",
shape);

if (!dim1_matches && dim2_matches) {
return ov::op::util::reorder_axes(weights, {0, 2, 1});
}
return weights;
}
Review-driven improvements on top of the DynamicQuantizeLSTM PR:

- Fix the test that could never pass. ONNX Runtime's DynamicQuantizeLSTM
  dynamically quantizes the activations (X and the recurrent hidden state)
  and runs integer matmuls, whereas this translator only dequantizes the
  W/R weights and runs a float LSTMSequence. The two differ by the
  activation-quantization noise (~1.5e-3 here), so the original 1e-6
  tolerance failed against the ORT-generated expected values. Relax the
  tolerance to 0.0055 (matching the sibling DynamicQuantizeMatMul test) and
  document the approximation with a TODO to model activation quantization
  (OpenVINO CPU/GPU plugins support dynamic quantization).

- Drop the speculative rank-2 (num_directions-omitted) weight handling.
  The com.microsoft spec defines W/R as rank-3 only; the rank-2 branch was
  dead code that would also have emitted outputs with an extra
  num_directions dimension (no squeeze-back, no bidirectional guard, unlike
  lstm.cpp). Require rank-3 and remove the now-dead original_rank plumbing.

- Reject the unsupported peephole input P instead of silently dropping it.

- Reduce duplication with the standard LSTM translator: extract
  normalize_tensor_rank and the default optional-input fabrication
  (dimension extraction + default bias / sequence_lens / initial state)
  into shared recurrent utils, used by both lstm.cpp and the dynamic
  translator. Data-bearing edges (X, W, R, provided initial states) stay
  explicit in each translator so future activation-quantization insertion
  is unobstructed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: CI OpenVINO public CI category: docs OpenVINO documentation category: ONNX FE OpenVINO ONNX FrontEnd

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants