Skip to content

Wzx dev deform att#5

Draft
zaixing-wang wants to merge 136 commits into
masterfrom
wzx_dev_deform_att
Draft

Wzx dev deform att#5
zaixing-wang wants to merge 136 commits into
masterfrom
wzx_dev_deform_att

Conversation

@zaixing-wang
Copy link
Copy Markdown
Owner

Details:

  • item1
  • ...

Tickets:

  • ticket-id

@zaixing-wang zaixing-wang force-pushed the wzx_dev_deform_att branch 4 times, most recently from be512dc to 4ec8009 Compare July 28, 2025 11:58
akodanka and others added 30 commits December 18, 2025 11:42
…lkit#32572)

### Details:
- Add ov::FdGetterType and ov::hint::fd_getter property
- Extend load_mmap_object() with fd-based overload (Linux only)
- Integrate fd_getter through NPUW deserialization flow
- Windows implementation throws unsupported exception

### Tickets:
 - EISW-179643

Signed-off-by: Anoob Anto Kodankandath <anoob.anto.kodankandath@intel.com>
…ision (openvinotoolkit#32739)

### Details:
In LLMs, certain patterns in subgraphs, such as rotary embeddings
calculation, involve math operations over integer numbers (position
indices) with results stored in floating-point precision. For such
operations, we may not unconditionally apply BF16 arithmetic, as the
BF16 data type can exactly represent numbers only within a narrow range
of values: [-128, 128]. Therefore, these subgraphs must be maintained in
FP32 precision to preserve accuracy. In this PR, we modify the BF16
markup procedure to keep these subgraphs in FP32 during inference.
The second pattern is gathering quantized embeddings with subsequent RMS
operation. It turned out that this subgraph has to be performed in
`fp32` to preserve accuracy.

### Tickets:
 - CVS-173763

---------

Co-authored-by: Vladislav Golubev <vladislav.golubev@intel.com>
Co-authored-by: Arseniy Obolenskiy <gooddoog@student.su>
…penvinotoolkit#33282)

### Details:
- *Implement accuracy mode that runs inference on reference and target
devices in parallel and compares outputs*
 - *Validate the reference and target data using config file's metric*
 - *Support `--reference_device` and `--target_device` CLI arguments*
 - *Enable parallel model compilation for both devices*
- *Create _REFERENCE and _TARGET output directories to dump output when
output_dir is present in config file*

### Tickets:
 - *EISW-121857*
…inotoolkit#33306)

### Details:
 - https://spdx.org/licenses/
 - Bump `setuptools` lower bound to use the new license standard
 - More details in the ticket

### Tickets:
 - CVS-178400

---------

Signed-off-by: p-wysocki <przemyslaw.wysocki@intel.com>
…nvinotoolkit#32255)

### Details:
 - Specification of MOE internal operation
- Internal ops are used mainly for fusion transformations and
optimizations,
 they will not appear in the converted model public IR
 
 Describes MOE used in PR: 
 - openvinotoolkit#32183

### Tickets:
 - 171911

---------

Co-authored-by: Tatiana Savina <tatiana.savina@intel.com>
Co-authored-by: Mateusz Mikolajczyk <mateusz.mikolajczyk@intel.com>
### Details:
Refactoring memory reference (MemRef) management to use opaque handles
and adding new API functions for MemRef creation, manipulation, and
destruction. It also updates error codes and function signatures to
support these changes.

**API changes for MemRef management:**

- Introduced a new opaque handle type
`npu_mlir_runtime_mem_ref_handle_t` for MemRefs, replacing the previous
struct-based approach.
- Added new API functions: `npuMLIRRuntimeCreateMemRef`,
`npuMLIRRuntimeDestroyMemRef`, `npuMLIRRuntimeSetMemRef`, and
`npuMLIRRuntimeParseMemRef` for creating, destroying, setting, and
parsing MemRef handles, respectively.


### Tickets:
 - *C174100*
)

### Details:
- Apply micro_gemm to improve prefill performance, which will make
experts can be executed in parallel to instead of execution in serial
 - test result:
 
<img width="788" height="203" alt="image"
src="https://github.com/user-attachments/assets/b04d61ba-4375-421e-ae68-5bb038006600"
/>

- Fix random accuracy issue when run bs > 1, and optimize 2nd token
performance of bs > 1

<img width="536" height="195" alt="image"
src="https://github.com/user-attachments/assets/b0d433d7-d544-4b7c-a02b-0ec034ffc240"
/>


### Tickets:
 - *CVS-176930*
 - *CVS-178252*

---------

Co-authored-by: Chen Peter <peter.chen@intel.com>
…inotoolkit#33167)

### Description of the issue(symptom, root-cause, how it was resolved)
- text_to_speech_generation_optimum pipeline on GPU is too slower than
CPU.
- random_uniform kernel build time is slow and it's building each
inference calling due to not support shape agnostic.

#### Reproduction step and snapshot (if applicable. Do not attach for
customer model)
 - reproduce steps in the ticket

#### Checklist
 - [x] Is it a proper fix? (not a workaround)
 - [x] Did you include test case for this fix, if necessary?
- [x] Did you review existing test that can be extended to cover this
scenario? Which test did you review?

### Tickets:
 - 177080
This doc scopes Linux debugging. Windows will come in a separate PR as
it's much more complicated.

### Tickets:
 - CVS-159139

---------

Signed-off-by: p-wysocki <przemyslaw.wysocki@intel.com>
Co-authored-by: Alicja Miloszewska <alicja.miloszewska@intel.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…lkit#33156)

### Details:
- The skip config gets multiple fallbacks for platform identification.
- Some tests can set their own desired platform, while the skip config
is populated before any test is run. This means that it needs to know
what's the expected platform, in order to populate the list correctly.
- This implementation uses `IE_NPU_TESTS_PLATFORM` as the first source
form platform identification, as it's a mandatory envvar to run
`ov_npu_func_tests`. It then falls back to using
`IE_NPU_TESTS_DEVICE_NAME`. If neither envvar provide a usable platform,
the final fallback is the list returned by the NPU Plugin

### Tickets:
 - E 189860
…erations (openvinotoolkit#33023)

### Details:

Subgraph from silero_vad.onnx
<img width="1776" height="701" alt="image"
src="https://github.com/user-attachments/assets/e9b5fc84-fb19-4022-a561-90880be60684"
/>

**Problem**

ONNX models exported from PyTorch frequently contain Unsqueeze
operations before LSTM nodes. These operations add extra dimensions to
tensors, resulting in rank-4 or rank-5 inputs to LSTM nodes. However,
the ONNX LSTM specification strictly requires rank-3 inputs with shape
[seq_length, batch_size, input_size]. Why this happens:
PyTorch models use various tensor shapes during training
During ONNX export, shape mismatches are "fixed" by inserting Unsqueeze
nodes
These Unsqueeze operations add dimensions with size 1 to match expected
shapes
The resulting LSTM inputs have rank > 3, violating ONNX LSTM
specification

**Real-world impact:**

Models like silero_vad.onnx contain 4 LSTM nodes, all with Unsqueeze
operations before them
Without this fix LSTM models fail to convert to OpenVINO IR

**Solution**
This fix adds automatic rank reduction in the ONNX Frontend LSTM
converter (src/frontends/onnx/frontend/src/op/lstm.cpp). The
implementation uses a two-strategy approach:
1. Squeeze Strategy (optimal path):
Used when all extra leading dimensions equal 1
Example: [1, 1, seq, batch, input] → [seq, batch, input]
Zero-cost operation that only changes metadata, no data movement
Applies to most real-world models (including silero_vad.onnx)
2. Reshape Strategy (fallback path):
Used when extra dimensions are > 1 or have dynamic shapes
Example: [2, 3, seq, batch, input] → [6, batch, input] (flattens leading
dimensions)
Handles edge cases and dynamic shapes
Uses dynamic shape calculation at runtime

**Implementation details:**

New function reduce_tensor_rank() analyzes input tensor rank and shape
Automatically selects optimal strategy based on dimension values
Applied to all LSTM inputs: X (data), initial_h (hidden state),
initial_c (cell state)
Transparent to users - no model modifications required
Code structure:
```
// Analyze input shape
if (input_rank <= target_rank) {
    return input;  // No reduction needed
}

// Check if all extra dimensions equal 1
if (all_extra_dims_are_one) {
    // Use Squeeze - optimal path
    return Squeeze(input, axes);
} else {
    // Use Reshape - fallback path
    return Reshape(input, new_shape);
}
```

**Performance:**
Squeeze path has zero runtime overhead (metadata-only operation)
Reshape path adds minimal overhead only for edge cases
No impact on models that already have rank-3 inputs


### Tickets:
 - 162986
…envinotoolkit#33315)

### Details:
- This change fixes an issue in the GPU oneDNN convolution post-op
optimization logic where FakeQuantize shift folding could be applied
more than once in the `eltw_and_bin`, `bin_and_eltw`, `eltw_and_scale`
path.
- Specifically, the fix ensures that shift folding is performed exactly
once during the initial compilation phase and is not re-applied when a
compiled model is loaded from cache.
- The logic now correctly distinguishes between first-time optimization
and cache-imported execution, preventing repeated modification of the
folded shift constant.

### Issues:
- When using GPU oneDNN convolution with cache enabled, the
`eltw_and_bin` post-op optimization path could re-apply FakeQuantize
shift folding after cache import.
- As a result, the shift value was folded twice, leading to incorrect
constant values and significant accuracy regressions in INT8 workloads.

### Checklist:
- [x] Is it a proper fix? (Not a workaround)
- [x] Did you include test case for this fix, if necessary?
- [x] Did you review existing test that can be extended to cover this
scenario?

### Tickets:
 - 176505
### Details:
 - Fix docs to align with actual behavior

### Tickets:
 - CVS-176708
### Details:
Implement Flash Attention algorithm on NPU:
- Tiled processing: Break K/V sequences into manageable chunks
- CPU orchestration: Host manages tile iteration and data flow
- NPU acceleration: NPU performs compute-intensive SDPA operations on
tiles
- Incremental accumulation: Combine results across tiles with numerical
stability

### Tickets:
 - *[EISW-194855](https://jira.devtools.intel.com/browse/EISW-194855)*

---------

Signed-off-by: intelgaoxiong <xiong.gao@intel.com>
Co-authored-by: Eugene Smirnov <eugene.smirnov@intel.com>
…ts (openvinotoolkit#33246)

### Details:
- *ConvertFullyConnectedToFullyConnectedCompressed callback for 3d
weights*

### Tickets:
 - *CVS-177976*
### Details:
- Add cmake option "ENABLE_INTEL_NPU_COMPILER" to enable/disable the
download of the library. Default value is ON.
- Library is downloaded from the openvino storage, but the original
source is [NPU Compiler
Release](https://github.com/openvinotoolkit/npu_compiler/releases).
- While the compiler will be included by default in openvino packages,
it will not be used yet by default by the NPU plugin.
- The new compiler library inside plugin can only be used through
NPU_COMPILER_TYPE=PLUGIN.
- New tests will gradually be migrated to use compiler-in-plugin flow (
outside of this PR since part of the scripts are outside of this
repository).
- Logic in NPU Plugin to automatically select the suitable compiler type
will be added later as well.

### Tickets:
 - *CVS-174281*

---------

Signed-off-by: Kang, Wenjing <wenjing.kang@intel.com>
Co-authored-by: Kang, Wenjing <wenjing.kang@intel.com>
### Details:
- updated `IncreasePositionIdsPrecisionForGPTOSS` to use f32 precision
for rope layers of the gpt-oss model.

### Tickets:
 - 177962
Bumps [pytest](https://github.com/pytest-dev/pytest) from 8.4.2 to
9.0.1.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/pytest-dev/pytest/releases">pytest's
releases</a>.</em></p>
<blockquote>
<h2>9.0.1</h2>
<h1>pytest 9.0.1 (2025-11-12)</h1>
<h2>Bug fixes</h2>
<ul>
<li><a
href="https://redirect.github.com/pytest-dev/pytest/issues/13895">#13895</a>:
Restore support for skipping tests via <code>raise
unittest.SkipTest</code>.</li>
<li><a
href="https://redirect.github.com/pytest-dev/pytest/issues/13896">#13896</a>:
The terminal progress plugin added in pytest 9.0 is now automatically
disabled when iTerm2 is detected, it generated desktop notifications
instead of the desired functionality.</li>
<li><a
href="https://redirect.github.com/pytest-dev/pytest/issues/13904">#13904</a>:
Fixed the TOML type of the verbosity settings in the API reference from
number to string.</li>
<li><a
href="https://redirect.github.com/pytest-dev/pytest/issues/13910">#13910</a>:
Fixed <!-- raw HTML omitted -->UserWarning: Do not expect
file_or_dir<!-- raw HTML omitted --> on some earlier Python 3.12 and
3.13 point versions.</li>
</ul>
<h2>Packaging updates and notes for downstreams</h2>
<ul>
<li><a
href="https://redirect.github.com/pytest-dev/pytest/issues/13933">#13933</a>:
The tox configuration has been adjusted to make sure the desired
version string can be passed into its <code>package_env</code> through
the <code>SETUPTOOLS_SCM_PRETEND_VERSION_FOR_PYTEST</code> environment
variable as a part of the release process -- by
<code>webknjaz</code>.</li>
</ul>
<h2>Contributor-facing changes</h2>
<ul>
<li><a
href="https://redirect.github.com/pytest-dev/pytest/issues/13891">#13891</a>,
<a
href="https://redirect.github.com/pytest-dev/pytest/issues/13942">#13942</a>:
The CI/CD part of the release automation is now capable of
creating GitHub Releases without having a Git checkout on
disk -- by <code>bluetech</code> and <code>webknjaz</code>.</li>
<li><a
href="https://redirect.github.com/pytest-dev/pytest/issues/13933">#13933</a>:
The tox configuration has been adjusted to make sure the desired
version string can be passed into its <code>package_env</code> through
the <code>SETUPTOOLS_SCM_PRETEND_VERSION_FOR_PYTEST</code> environment
variable as a part of the release process -- by
<code>webknjaz</code>.</li>
</ul>
<h2>9.0.0</h2>
<h1>pytest 9.0.0 (2025-11-05)</h1>
<h2>New features</h2>
<ul>
<li>
<p><a
href="https://redirect.github.com/pytest-dev/pytest/issues/1367">#1367</a>:
<strong>Support for subtests</strong> has been added.</p>
<p><code>subtests &lt;subtests&gt;</code> are an alternative to
parametrization, useful in situations where the parametrization values
are not all known at collection time.</p>
<p>Example:</p>
<pre lang="python"><code>def contains_docstring(p: Path) -&gt; bool:
&quot;&quot;&quot;Return True if the given Python file contains a
top-level docstring.&quot;&quot;&quot;
    ...
<p>def test_py_files_contain_docstring(subtests: pytest.Subtests) -&gt;
None:<br />
for path in Path.cwd().glob(&quot;*.py&quot;):<br />
with subtests.test(path=str(path)):<br />
assert contains_docstring(path)<br />
</code></pre></p>
</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="https://github.com/pytest-dev/pytest/commit/d1b64aa60b9e1a0fcfaf03af7ebeb185f1024a87"><code>d1b64aa</code></a>
Prepare release version 9.0.1</li>
<li><a
href="https://github.com/pytest-dev/pytest/commit/0a497c7b213ea950821319fd80dce219b0033f32"><code>0a497c7</code></a>
regendoc: remove CI environment variables (<a
href="https://redirect.github.com/pytest-dev/pytest/issues/13950">#13950</a>)
(<a
href="https://redirect.github.com/pytest-dev/pytest/issues/13951">#13951</a>)</li>
<li><a
href="https://github.com/pytest-dev/pytest/commit/a9f7e6ed579b8844e302067b7f05122b82993355"><code>a9f7e6e</code></a>
🧪 Run <code>gh release</code> w/o Git in CI/CD (<a
href="https://redirect.github.com/pytest-dev/pytest/issues/13942">#13942</a>)
(<a
href="https://redirect.github.com/pytest-dev/pytest/issues/13947">#13947</a>)</li>
<li><a
href="https://github.com/pytest-dev/pytest/commit/2682a6607304f1f5bb5a2140340003cdf5121bc4"><code>2682a66</code></a>
Merge pull request <a
href="https://redirect.github.com/pytest-dev/pytest/issues/13944">#13944</a>
from pytest-dev/patchback/backports/9.0.x/bef7d34f1...</li>
<li><a
href="https://github.com/pytest-dev/pytest/commit/a999997e36c53d189ecded3369bf35bfe2be96ad"><code>a999997</code></a>
Merge pull request <a
href="https://redirect.github.com/pytest-dev/pytest/issues/13941">#13941</a>
from nicoddemus/min-pre-commit-version</li>
<li><a
href="https://github.com/pytest-dev/pytest/commit/4bd63a0ead81d740aa767a4384d3b0b4c18f2ef2"><code>4bd63a0</code></a>
Merge pull request <a
href="https://redirect.github.com/pytest-dev/pytest/issues/13935">#13935</a>
from pytest-dev/patchback/backports/9.0.x/ce8b8a7b4...</li>
<li><a
href="https://github.com/pytest-dev/pytest/commit/15f93b332c1c3ec9c200c0ad3d55af5a2158e0db"><code>15f93b3</code></a>
Merge pull request <a
href="https://redirect.github.com/pytest-dev/pytest/issues/13933">#13933</a>
from webknjaz/maintenance/tox-pep517-env-setuptools...</li>
<li><a
href="https://github.com/pytest-dev/pytest/commit/0fa11ae3f79d06dc9e2f1f7c81ade4a1126d9ef3"><code>0fa11ae</code></a>
Merge pull request <a
href="https://redirect.github.com/pytest-dev/pytest/issues/13927">#13927</a>
from pytest-dev/patchback/backports/9.0.x/3d8075743...</li>
<li><a
href="https://github.com/pytest-dev/pytest/commit/fa454700133c7b2cc960cba3b1cd09cc048c25a0"><code>fa45470</code></a>
Merge pull request <a
href="https://redirect.github.com/pytest-dev/pytest/issues/13926">#13926</a>
from pytest-dev/patchback/backports/9.0.x/d587e0cf8...</li>
<li><a
href="https://github.com/pytest-dev/pytest/commit/b4e3973505a2b7a2caa17ccc392d91a6ad73e122"><code>b4e3973</code></a>
Merge pull request <a
href="https://redirect.github.com/pytest-dev/pytest/issues/13922">#13922</a>
from bluetech/fix-argparse-userwarning</li>
<li>Additional commits viewable in <a
href="https://github.com/pytest-dev/pytest/compare/8.4.2...9.0.1">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=pytest&package-manager=pip&previous-version=8.4.2&new-version=9.0.1)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

You can trigger a rebase of this PR by commenting `@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

> **Note**
> Automatic rebases have been disabled on this pull request as it has
been open for over 30 days.

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…t#33100)

### Description of the issue
- Do refactoring `onednn::layout_to_memory_desc()` function to improve
readability and maintainability.

#### The code and line that caused this issue
-
https://github.com/openvinotoolkit/openvino/blob/6b0c6a3cf36e0396a1d5e32af65d979a625ffcef/src/plugins/intel_gpu/src/graph/impls/onednn/utils.hpp#L48
#### Checklist
 - [x] Is it a proper fix? (not a workaround)
 - [x] Did you include test case for this fix, if necessary?
- [x] Did you review existing test that can be extended to cover this
scenario? Which test did you review?

### Tickets:
 - *175118*
### Details:
 - Use std::filesystem::path in `VisualizeTree` public class
 - Update related code to use path instead string

### Tickets:
 - N/A

---------

Signed-off-by: Pawel Raasz <pawel.raasz@intel.com>
…olkit#33308)

### Details:
- *Pass sccache Azure Blob Storage connection string using files instead
of environment variables*


### Tickets:
 - *CVS-175241*
…ms (openvinotoolkit#32652)

Changes:
- Build OpenVINO with `-fprofile-update=atomic` and `--coverage` to
avoid negative counters.
- Remove `ov_coverage` CMake target from the workflow (coverage is
generated directly via `lcov`).
- Capture coverage only from `build/` with `lcov --base-directory` set
to repo root.
- Exclude generated protobuf files, all `*/tests/*`, and `thirdparty/*`
from the report (as in coverage.cmake)
- Add `--ignore-errors mismatch,unused` to make lcov/genhtml robust.
- Generate short coverage summary in GHA.
- Generate artifact with full archived lcov/genhtml report.
…nvinotoolkit#32094)

### Details:
- integrate KleidiAI fp16 matmul and packing ukernels and let
GEMM/GemmCopyB executors pick fp16/fp32 paths
- enable MHA pattern execution on fp16 for arm64

### Tickets:
 - 169101
### Details:
 - *Update npu zero ext*
 - *Use correct stype*
 - *Initialize variables*

### Tickets:
 - *N/A*

---------

Signed-off-by: Bogdan Pereanu <bogdan.pereanu@intel.com>
Co-authored-by: Pawel Raasz <pawel.raasz@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.