[CPU] The Gather JIT kernel improvement. Part 1.#13821
[CPU] The Gather JIT kernel improvement. Part 1.#13821OleksiiZaderykhinCapgemini wants to merge 3 commits into
Conversation
…e, 8bit/16bit/32bit, StaticShapes/DynamicShapes cases were too coupled. The same registers were used for different cases. Now, these cases were split and the registers pool was used. This is done as the second part of the Gather JIT kernel improvement to cover all execution cases.
…de. The jitGatherKernelBase<isa>::isSameParams() method was improved. The description of the supported cases of the Gather JIT kernel was updated.
…tor.
The Blocked/Short case for static shapes for avx2 for 8-bit and 16-bit data types was implemented.
The Blocked/Short case for dynamic shapes for avx2 and avx512 for 8-bit, 16-bit, and 32-bit data types.
The implementation map for the JIT kernel for the Gather operation.
- avx512
- dynamic shapes
- Elementwise case
- Short subcase Implemented
- Long subcase Implemented
- Blocked case
- Short subcase Implemented in this PR
- Long subcase Not implemented
- static shapes
- Elementwise case
- Short subcase Implemented
- Long subcase Implemented
- Blocked case
- Short subcase Implemented
- Long subcase Not implemented
- avx2
- dynamic shapes
- Elementwise case
- Short subcase Implemented
- Long subcase Implemented
- Blocked case
- Short subcase Implemented in this PR
- Long subcase Not implemented
- static shapes
- Elementwise case
- Short subcase Implemented
- Long subcase Implemented
- Blocked case
- Short subcase Implemented in this PR for 8 bit and 16 bit data types, for 32 bit was implemented earlier
- Long subcase Not implemented
- SSE4.1 Not implemented
| // Currently the Blocked/Short case was implemented in JIT kernel and is not use reference implementation anymore | ||
| // But the Blocked/Long case has not been implemented in the JIT kernel yet. | ||
| // And these tests must be uncommented after its implementation | ||
| /* |
| std::map<std::string, std::string> // Additional config | ||
| > GatherCoverageLayerTestCPUParams; | ||
|
|
||
| class GatherCoverageLayerTestCPU : public testing::WithParamInterface<GatherCoverageLayerTestCPUParams>, |
There was a problem hiding this comment.
Can it be unified with GatherLayerTestCPU?
|
There are already 2 PRs for Gather with a lot of refactoring, but the initial task not solved yet. Please focus on the main task. Each PR requires degradation/improvement validation. Some cases require manual validation, thus need to reduce a number of such PRs. If this refactoring is really needed to solve the issue move it into the first PR, but the second one should contain the final solution. |
@nshchego Ok then, I am closing this PR and will prepare PR with the final solution instead. |
The next cases were implemented for the JIT kernel of the Gather operator.
The Blocked/Short case for static shapes for avx2 for 8-bit and 16-bit data types was implemented.
The Blocked/Short case for dynamic shapes for avx2 and avx512 for 8-bit, 16-bit, and 32-bit data types.
The implementation map for the JIT kernel for the Gather operation.
This PR uses and depends on the StackAllocator from PR #13790
This PR is a continuation of PR #13070 and is based on it.