Wzx dev deform att#5
Draft
zaixing-wang wants to merge 136 commits into
Draft
Conversation
be512dc to
4ec8009
Compare
…lkit#32572) ### Details: - Add ov::FdGetterType and ov::hint::fd_getter property - Extend load_mmap_object() with fd-based overload (Linux only) - Integrate fd_getter through NPUW deserialization flow - Windows implementation throws unsupported exception ### Tickets: - EISW-179643 Signed-off-by: Anoob Anto Kodankandath <anoob.anto.kodankandath@intel.com>
…ision (openvinotoolkit#32739) ### Details: In LLMs, certain patterns in subgraphs, such as rotary embeddings calculation, involve math operations over integer numbers (position indices) with results stored in floating-point precision. For such operations, we may not unconditionally apply BF16 arithmetic, as the BF16 data type can exactly represent numbers only within a narrow range of values: [-128, 128]. Therefore, these subgraphs must be maintained in FP32 precision to preserve accuracy. In this PR, we modify the BF16 markup procedure to keep these subgraphs in FP32 during inference. The second pattern is gathering quantized embeddings with subsequent RMS operation. It turned out that this subgraph has to be performed in `fp32` to preserve accuracy. ### Tickets: - CVS-173763 --------- Co-authored-by: Vladislav Golubev <vladislav.golubev@intel.com> Co-authored-by: Arseniy Obolenskiy <gooddoog@student.su>
…openvinotoolkit#33094) Reverts openvinotoolkit#33030 Co-authored-by: Andrey Babushkin <andrey.babushkin@intel.com>
…penvinotoolkit#33282) ### Details: - *Implement accuracy mode that runs inference on reference and target devices in parallel and compares outputs* - *Validate the reference and target data using config file's metric* - *Support `--reference_device` and `--target_device` CLI arguments* - *Enable parallel model compilation for both devices* - *Create _REFERENCE and _TARGET output directories to dump output when output_dir is present in config file* ### Tickets: - *EISW-121857*
…inotoolkit#33306) ### Details: - https://spdx.org/licenses/ - Bump `setuptools` lower bound to use the new license standard - More details in the ticket ### Tickets: - CVS-178400 --------- Signed-off-by: p-wysocki <przemyslaw.wysocki@intel.com>
…nvinotoolkit#32255) ### Details: - Specification of MOE internal operation - Internal ops are used mainly for fusion transformations and optimizations, they will not appear in the converted model public IR Describes MOE used in PR: - openvinotoolkit#32183 ### Tickets: - 171911 --------- Co-authored-by: Tatiana Savina <tatiana.savina@intel.com> Co-authored-by: Mateusz Mikolajczyk <mateusz.mikolajczyk@intel.com>
### Details: Refactoring memory reference (MemRef) management to use opaque handles and adding new API functions for MemRef creation, manipulation, and destruction. It also updates error codes and function signatures to support these changes. **API changes for MemRef management:** - Introduced a new opaque handle type `npu_mlir_runtime_mem_ref_handle_t` for MemRefs, replacing the previous struct-based approach. - Added new API functions: `npuMLIRRuntimeCreateMemRef`, `npuMLIRRuntimeDestroyMemRef`, `npuMLIRRuntimeSetMemRef`, and `npuMLIRRuntimeParseMemRef` for creating, destroying, setting, and parsing MemRef handles, respectively. ### Tickets: - *C174100*
) ### Details: - Apply micro_gemm to improve prefill performance, which will make experts can be executed in parallel to instead of execution in serial - test result: <img width="788" height="203" alt="image" src="https://github.com/user-attachments/assets/b04d61ba-4375-421e-ae68-5bb038006600" /> - Fix random accuracy issue when run bs > 1, and optimize 2nd token performance of bs > 1 <img width="536" height="195" alt="image" src="https://github.com/user-attachments/assets/b0d433d7-d544-4b7c-a02b-0ec034ffc240" /> ### Tickets: - *CVS-176930* - *CVS-178252* --------- Co-authored-by: Chen Peter <peter.chen@intel.com>
…inotoolkit#33167) ### Description of the issue(symptom, root-cause, how it was resolved) - text_to_speech_generation_optimum pipeline on GPU is too slower than CPU. - random_uniform kernel build time is slow and it's building each inference calling due to not support shape agnostic. #### Reproduction step and snapshot (if applicable. Do not attach for customer model) - reproduce steps in the ticket #### Checklist - [x] Is it a proper fix? (not a workaround) - [x] Did you include test case for this fix, if necessary? - [x] Did you review existing test that can be extended to cover this scenario? Which test did you review? ### Tickets: - 177080
This doc scopes Linux debugging. Windows will come in a separate PR as it's much more complicated. ### Tickets: - CVS-159139 --------- Signed-off-by: p-wysocki <przemyslaw.wysocki@intel.com> Co-authored-by: Alicja Miloszewska <alicja.miloszewska@intel.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…lkit#33156) ### Details: - The skip config gets multiple fallbacks for platform identification. - Some tests can set their own desired platform, while the skip config is populated before any test is run. This means that it needs to know what's the expected platform, in order to populate the list correctly. - This implementation uses `IE_NPU_TESTS_PLATFORM` as the first source form platform identification, as it's a mandatory envvar to run `ov_npu_func_tests`. It then falls back to using `IE_NPU_TESTS_DEVICE_NAME`. If neither envvar provide a usable platform, the final fallback is the list returned by the NPU Plugin ### Tickets: - E 189860
…erations (openvinotoolkit#33023) ### Details: Subgraph from silero_vad.onnx <img width="1776" height="701" alt="image" src="https://github.com/user-attachments/assets/e9b5fc84-fb19-4022-a561-90880be60684" /> **Problem** ONNX models exported from PyTorch frequently contain Unsqueeze operations before LSTM nodes. These operations add extra dimensions to tensors, resulting in rank-4 or rank-5 inputs to LSTM nodes. However, the ONNX LSTM specification strictly requires rank-3 inputs with shape [seq_length, batch_size, input_size]. Why this happens: PyTorch models use various tensor shapes during training During ONNX export, shape mismatches are "fixed" by inserting Unsqueeze nodes These Unsqueeze operations add dimensions with size 1 to match expected shapes The resulting LSTM inputs have rank > 3, violating ONNX LSTM specification **Real-world impact:** Models like silero_vad.onnx contain 4 LSTM nodes, all with Unsqueeze operations before them Without this fix LSTM models fail to convert to OpenVINO IR **Solution** This fix adds automatic rank reduction in the ONNX Frontend LSTM converter (src/frontends/onnx/frontend/src/op/lstm.cpp). The implementation uses a two-strategy approach: 1. Squeeze Strategy (optimal path): Used when all extra leading dimensions equal 1 Example: [1, 1, seq, batch, input] → [seq, batch, input] Zero-cost operation that only changes metadata, no data movement Applies to most real-world models (including silero_vad.onnx) 2. Reshape Strategy (fallback path): Used when extra dimensions are > 1 or have dynamic shapes Example: [2, 3, seq, batch, input] → [6, batch, input] (flattens leading dimensions) Handles edge cases and dynamic shapes Uses dynamic shape calculation at runtime **Implementation details:** New function reduce_tensor_rank() analyzes input tensor rank and shape Automatically selects optimal strategy based on dimension values Applied to all LSTM inputs: X (data), initial_h (hidden state), initial_c (cell state) Transparent to users - no model modifications required Code structure: ``` // Analyze input shape if (input_rank <= target_rank) { return input; // No reduction needed } // Check if all extra dimensions equal 1 if (all_extra_dims_are_one) { // Use Squeeze - optimal path return Squeeze(input, axes); } else { // Use Reshape - fallback path return Reshape(input, new_shape); } ``` **Performance:** Squeeze path has zero runtime overhead (metadata-only operation) Reshape path adds minimal overhead only for edge cases No impact on models that already have rank-3 inputs ### Tickets: - 162986
…envinotoolkit#33315) ### Details: - This change fixes an issue in the GPU oneDNN convolution post-op optimization logic where FakeQuantize shift folding could be applied more than once in the `eltw_and_bin`, `bin_and_eltw`, `eltw_and_scale` path. - Specifically, the fix ensures that shift folding is performed exactly once during the initial compilation phase and is not re-applied when a compiled model is loaded from cache. - The logic now correctly distinguishes between first-time optimization and cache-imported execution, preventing repeated modification of the folded shift constant. ### Issues: - When using GPU oneDNN convolution with cache enabled, the `eltw_and_bin` post-op optimization path could re-apply FakeQuantize shift folding after cache import. - As a result, the shift value was folded twice, leading to incorrect constant values and significant accuracy regressions in INT8 workloads. ### Checklist: - [x] Is it a proper fix? (Not a workaround) - [x] Did you include test case for this fix, if necessary? - [x] Did you review existing test that can be extended to cover this scenario? ### Tickets: - 176505
### Tickets: - *EISW-197365*
### Details: - Fix docs to align with actual behavior ### Tickets: - CVS-176708
### Details: Implement Flash Attention algorithm on NPU: - Tiled processing: Break K/V sequences into manageable chunks - CPU orchestration: Host manages tile iteration and data flow - NPU acceleration: NPU performs compute-intensive SDPA operations on tiles - Incremental accumulation: Combine results across tiles with numerical stability ### Tickets: - *[EISW-194855](https://jira.devtools.intel.com/browse/EISW-194855)* --------- Signed-off-by: intelgaoxiong <xiong.gao@intel.com> Co-authored-by: Eugene Smirnov <eugene.smirnov@intel.com>
…ts (openvinotoolkit#33246) ### Details: - *ConvertFullyConnectedToFullyConnectedCompressed callback for 3d weights* ### Tickets: - *CVS-177976*
### Details: - Add cmake option "ENABLE_INTEL_NPU_COMPILER" to enable/disable the download of the library. Default value is ON. - Library is downloaded from the openvino storage, but the original source is [NPU Compiler Release](https://github.com/openvinotoolkit/npu_compiler/releases). - While the compiler will be included by default in openvino packages, it will not be used yet by default by the NPU plugin. - The new compiler library inside plugin can only be used through NPU_COMPILER_TYPE=PLUGIN. - New tests will gradually be migrated to use compiler-in-plugin flow ( outside of this PR since part of the scripts are outside of this repository). - Logic in NPU Plugin to automatically select the suitable compiler type will be added later as well. ### Tickets: - *CVS-174281* --------- Signed-off-by: Kang, Wenjing <wenjing.kang@intel.com> Co-authored-by: Kang, Wenjing <wenjing.kang@intel.com>
### Details: - updated `IncreasePositionIdsPrecisionForGPTOSS` to use f32 precision for rope layers of the gpt-oss model. ### Tickets: - 177962
Bumps [pytest](https://github.com/pytest-dev/pytest) from 8.4.2 to 9.0.1. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/pytest-dev/pytest/releases">pytest's releases</a>.</em></p> <blockquote> <h2>9.0.1</h2> <h1>pytest 9.0.1 (2025-11-12)</h1> <h2>Bug fixes</h2> <ul> <li><a href="https://redirect.github.com/pytest-dev/pytest/issues/13895">#13895</a>: Restore support for skipping tests via <code>raise unittest.SkipTest</code>.</li> <li><a href="https://redirect.github.com/pytest-dev/pytest/issues/13896">#13896</a>: The terminal progress plugin added in pytest 9.0 is now automatically disabled when iTerm2 is detected, it generated desktop notifications instead of the desired functionality.</li> <li><a href="https://redirect.github.com/pytest-dev/pytest/issues/13904">#13904</a>: Fixed the TOML type of the verbosity settings in the API reference from number to string.</li> <li><a href="https://redirect.github.com/pytest-dev/pytest/issues/13910">#13910</a>: Fixed <!-- raw HTML omitted -->UserWarning: Do not expect file_or_dir<!-- raw HTML omitted --> on some earlier Python 3.12 and 3.13 point versions.</li> </ul> <h2>Packaging updates and notes for downstreams</h2> <ul> <li><a href="https://redirect.github.com/pytest-dev/pytest/issues/13933">#13933</a>: The tox configuration has been adjusted to make sure the desired version string can be passed into its <code>package_env</code> through the <code>SETUPTOOLS_SCM_PRETEND_VERSION_FOR_PYTEST</code> environment variable as a part of the release process -- by <code>webknjaz</code>.</li> </ul> <h2>Contributor-facing changes</h2> <ul> <li><a href="https://redirect.github.com/pytest-dev/pytest/issues/13891">#13891</a>, <a href="https://redirect.github.com/pytest-dev/pytest/issues/13942">#13942</a>: The CI/CD part of the release automation is now capable of creating GitHub Releases without having a Git checkout on disk -- by <code>bluetech</code> and <code>webknjaz</code>.</li> <li><a href="https://redirect.github.com/pytest-dev/pytest/issues/13933">#13933</a>: The tox configuration has been adjusted to make sure the desired version string can be passed into its <code>package_env</code> through the <code>SETUPTOOLS_SCM_PRETEND_VERSION_FOR_PYTEST</code> environment variable as a part of the release process -- by <code>webknjaz</code>.</li> </ul> <h2>9.0.0</h2> <h1>pytest 9.0.0 (2025-11-05)</h1> <h2>New features</h2> <ul> <li> <p><a href="https://redirect.github.com/pytest-dev/pytest/issues/1367">#1367</a>: <strong>Support for subtests</strong> has been added.</p> <p><code>subtests <subtests></code> are an alternative to parametrization, useful in situations where the parametrization values are not all known at collection time.</p> <p>Example:</p> <pre lang="python"><code>def contains_docstring(p: Path) -> bool: """Return True if the given Python file contains a top-level docstring.""" ... <p>def test_py_files_contain_docstring(subtests: pytest.Subtests) -> None:<br /> for path in Path.cwd().glob("*.py"):<br /> with subtests.test(path=str(path)):<br /> assert contains_docstring(path)<br /> </code></pre></p> </li> </ul> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/pytest-dev/pytest/commit/d1b64aa60b9e1a0fcfaf03af7ebeb185f1024a87"><code>d1b64aa</code></a> Prepare release version 9.0.1</li> <li><a href="https://github.com/pytest-dev/pytest/commit/0a497c7b213ea950821319fd80dce219b0033f32"><code>0a497c7</code></a> regendoc: remove CI environment variables (<a href="https://redirect.github.com/pytest-dev/pytest/issues/13950">#13950</a>) (<a href="https://redirect.github.com/pytest-dev/pytest/issues/13951">#13951</a>)</li> <li><a href="https://github.com/pytest-dev/pytest/commit/a9f7e6ed579b8844e302067b7f05122b82993355"><code>a9f7e6e</code></a> 🧪 Run <code>gh release</code> w/o Git in CI/CD (<a href="https://redirect.github.com/pytest-dev/pytest/issues/13942">#13942</a>) (<a href="https://redirect.github.com/pytest-dev/pytest/issues/13947">#13947</a>)</li> <li><a href="https://github.com/pytest-dev/pytest/commit/2682a6607304f1f5bb5a2140340003cdf5121bc4"><code>2682a66</code></a> Merge pull request <a href="https://redirect.github.com/pytest-dev/pytest/issues/13944">#13944</a> from pytest-dev/patchback/backports/9.0.x/bef7d34f1...</li> <li><a href="https://github.com/pytest-dev/pytest/commit/a999997e36c53d189ecded3369bf35bfe2be96ad"><code>a999997</code></a> Merge pull request <a href="https://redirect.github.com/pytest-dev/pytest/issues/13941">#13941</a> from nicoddemus/min-pre-commit-version</li> <li><a href="https://github.com/pytest-dev/pytest/commit/4bd63a0ead81d740aa767a4384d3b0b4c18f2ef2"><code>4bd63a0</code></a> Merge pull request <a href="https://redirect.github.com/pytest-dev/pytest/issues/13935">#13935</a> from pytest-dev/patchback/backports/9.0.x/ce8b8a7b4...</li> <li><a href="https://github.com/pytest-dev/pytest/commit/15f93b332c1c3ec9c200c0ad3d55af5a2158e0db"><code>15f93b3</code></a> Merge pull request <a href="https://redirect.github.com/pytest-dev/pytest/issues/13933">#13933</a> from webknjaz/maintenance/tox-pep517-env-setuptools...</li> <li><a href="https://github.com/pytest-dev/pytest/commit/0fa11ae3f79d06dc9e2f1f7c81ade4a1126d9ef3"><code>0fa11ae</code></a> Merge pull request <a href="https://redirect.github.com/pytest-dev/pytest/issues/13927">#13927</a> from pytest-dev/patchback/backports/9.0.x/3d8075743...</li> <li><a href="https://github.com/pytest-dev/pytest/commit/fa454700133c7b2cc960cba3b1cd09cc048c25a0"><code>fa45470</code></a> Merge pull request <a href="https://redirect.github.com/pytest-dev/pytest/issues/13926">#13926</a> from pytest-dev/patchback/backports/9.0.x/d587e0cf8...</li> <li><a href="https://github.com/pytest-dev/pytest/commit/b4e3973505a2b7a2caa17ccc392d91a6ad73e122"><code>b4e3973</code></a> Merge pull request <a href="https://redirect.github.com/pytest-dev/pytest/issues/13922">#13922</a> from bluetech/fix-argparse-userwarning</li> <li>Additional commits viewable in <a href="https://github.com/pytest-dev/pytest/compare/8.4.2...9.0.1">compare view</a></li> </ul> </details> <br /> [](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) You can trigger a rebase of this PR by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> > **Note** > Automatic rebases have been disabled on this pull request as it has been open for over 30 days. Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…t#33100) ### Description of the issue - Do refactoring `onednn::layout_to_memory_desc()` function to improve readability and maintainability. #### The code and line that caused this issue - https://github.com/openvinotoolkit/openvino/blob/6b0c6a3cf36e0396a1d5e32af65d979a625ffcef/src/plugins/intel_gpu/src/graph/impls/onednn/utils.hpp#L48 #### Checklist - [x] Is it a proper fix? (not a workaround) - [x] Did you include test case for this fix, if necessary? - [x] Did you review existing test that can be extended to cover this scenario? Which test did you review? ### Tickets: - *175118*
### Details: - Use std::filesystem::path in `VisualizeTree` public class - Update related code to use path instead string ### Tickets: - N/A --------- Signed-off-by: Pawel Raasz <pawel.raasz@intel.com>
…olkit#33308) ### Details: - *Pass sccache Azure Blob Storage connection string using files instead of environment variables* ### Tickets: - *CVS-175241*
…ms (openvinotoolkit#32652) Changes: - Build OpenVINO with `-fprofile-update=atomic` and `--coverage` to avoid negative counters. - Remove `ov_coverage` CMake target from the workflow (coverage is generated directly via `lcov`). - Capture coverage only from `build/` with `lcov --base-directory` set to repo root. - Exclude generated protobuf files, all `*/tests/*`, and `thirdparty/*` from the report (as in coverage.cmake) - Add `--ignore-errors mismatch,unused` to make lcov/genhtml robust. - Generate short coverage summary in GHA. - Generate artifact with full archived lcov/genhtml report.
…nvinotoolkit#32094) ### Details: - integrate KleidiAI fp16 matmul and packing ukernels and let GEMM/GemmCopyB executors pick fp16/fp32 paths - enable MHA pattern execution on fp16 for arm64 ### Tickets: - 169101
### Details: - *Update npu zero ext* - *Use correct stype* - *Initialize variables* ### Tickets: - *N/A* --------- Signed-off-by: Bogdan Pereanu <bogdan.pereanu@intel.com>
Co-authored-by: Pawel Raasz <pawel.raasz@intel.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Details:
Tickets: