[CPU] Implementation of jit_kernel_base#12991
Conversation
|
@dmitry-gorokhov @ceciliapeng2011 Please, take a look at this review and assign reviwers |
b87358f to
62ebd99
Compare
62ebd99 to
be8b3f1
Compare
c79e5b3 to
ca9f793
Compare
| inline void uni_vgatherdps(const Xbyak::Xmm &xmm_val, | ||
| const Xbyak::Reg64 ®_addr, | ||
| const Xbyak::Xmm &xmm_index, | ||
| const int &scale, |
There was a problem hiding this comment.
Better to set default values for scale and disp and do nothing for that values. Also need to check them for acceptable values.
There was a problem hiding this comment.
uni_vgatherdps and uni_vscatterdps were excluded from this PR as you requested.
| const size_t kDataTypeSize = sizeof(float); | ||
| if (is_valid_isa(x64::avx512_core)) { | ||
| assert(reg_mask.isOPMASK()); | ||
| vgatherdps(xmm_val, ptr[reg_addr + xmm_index * scale + disp]); |
There was a problem hiding this comment.
Why the mask is not used here?
There was a problem hiding this comment.
uni_vgatherdps and uni_vscatterdps were excluded from this PR as you requested.
| assert(reg_mask.isYMM()); | ||
| Xbyak::Ymm ymm_mask{reg_mask.getIdx()}; | ||
| vgatherdps(xmm_val, ptr[reg_addr + xmm_index * scale + disp], ymm_mask); | ||
| } else { |
There was a problem hiding this comment.
vgatherdps was added since AVX2. For AVX it's emulated the same way as for SSE.
There was a problem hiding this comment.
uni_vgatherdps and uni_vscatterdps were excluded from this PR as you requested.
| for (int i = 0; i < static_cast<int>(kSimdWidth); i++) { | ||
| Xbyak::Label gather_end; | ||
| uni_vpextrd(mask, xmm_mask, i); | ||
| cmp(mask, 0xFFFFFFFF); |
There was a problem hiding this comment.
Actually need to check just most significant bit.
There was a problem hiding this comment.
uni_vgatherdps and uni_vscatterdps were excluded from this PR as you requested.
| } | ||
| } | ||
|
|
||
| inline void uni_vscatterdps(const Xbyak::Reg64& reg_addr, |
There was a problem hiding this comment.
The same remarks as for the gather.
Also please move both of this function into PR where they are used.
There was a problem hiding this comment.
uni_vgatherdps and uni_vscatterdps were excluded from this PR as you requested.
| return x64::is_subset(isa, max_cpu_isa_) && x64::mayiuse(isa); | ||
| } | ||
|
|
||
| inline void uni_vgatherdps(const Xbyak::Xmm &xmm_val, |
There was a problem hiding this comment.
How do you think of @dmitry-gorokhov 's suggestion in openvinotoolkit/oneDNN#145 to use jit-emitter instead for gather and scatter?
@lohika-denis-kotov @nshchego
There was a problem hiding this comment.
uni_vgatherdps() is not a part of this PR now
52309f8 to
7f94f4f
Compare
3856b51 to
6e3e679
Compare
6e3e679 to
cd6d806
Compare
| #include <math.h> | ||
| #include <dnnl_extension_utils.h> |
There was a problem hiding this comment.
Do we really need those two header here?
| #include <math.h> | ||
| #include <dnnl_extension_utils.h> |
There was a problem hiding this comment.
Could you please just check headers that are not used within the PR?
| RegistersPool::Ptr registers_pool_; | ||
| std::unique_ptr<StackAllocator> stack_allocator_; |
There was a problem hiding this comment.
As far as I remember we discussed placing those two entities on the jit_base level.
There was a problem hiding this comment.
Please, take one more look at this file and confirm that we really need to do this.
This change will additionally require to make stack allocator and registers pool members protected or to create two additional protected methods to create/destroy these objects.
| assert(xmm_index.getKind() == xmm_mask.getKind()); | ||
|
|
||
| std::vector<Xbyak::Reg> not_available_reg{reg_addr}; | ||
| const Xbyak::Reg64 idx = register_pool()->getInplaceFree<Xbyak::Reg64>(not_available_reg); |
There was a problem hiding this comment.
The idea is to add factory methods to the jit_base level, so that we can hide the whole register management mechanism to decouple user code from the utility entities.
There was a problem hiding this comment.
What do you mean by decouple user code from the utility entities? You will still need to include registers pool header everywhere you are using register_pool(), no matter if it's located in jit_kernel_base or jit_base. Registers pool lifetime is already encapsulated in jit_kernel_base, so what's the point?
| } | ||
|
|
||
| template<typename TReg> | ||
| inline TReg getInplaceFree(std::vector<Xbyak::Reg>& not_available) const { |
There was a problem hiding this comment.
In my perception, RegisterPool should remain the bare minimum of the interface functionality in spite of simplicity, but the interface should be complete (as it is so far). Such utility code should be placed in jit_base.
There was a problem hiding this comment.
Agree. But getInplaceFree() is no longer a part of this PR.
| namespace ov { | ||
| namespace intel_cpu { | ||
|
|
||
| using namespace dnnl::impl; |
There was a problem hiding this comment.
Please don't use 'using namespace' in header files.
| template<x64::cpu_isa_t isa, ov::element::Type_t element_type> | ||
| class jit_kernel_traits { | ||
| public: | ||
| using Vmm = typename conditional3<isa == x64::sse41, |
There was a problem hiding this comment.
Please add case for AVX as well
There was a problem hiding this comment.
Regarding using Vmm definition, there are about 90 occurrences of it in our code base. My understanding is that is more convenient to just use something like using Vmm = typename x64::cpu_isa_traits<isa>::Vmm instead.
SIMD_WIDTH is commonly used name/definition in our code base and I believe it's hard to find kernel which doesn't use something like it. However, in this header this name depends on ov::element::Type_t, which makes this definition too specific.
So, I decided to exclude this file from PR.
VCMPPS_LE and VCMPPS_GT constants were moved to jit_base. More constants like these will be added as needed.
|
|
||
| static constexpr unsigned VCMPPS_LE = 0x02; | ||
| static constexpr unsigned VCMPPS_GT = 0x0e; | ||
| static constexpr unsigned SIMD_WIDTH = x64::cpu_isa_traits<isa>::vlen / sizeof(typename ov::element_type_traits<element_type>::value_type); |
There was a problem hiding this comment.
SIMD_WIDTH little bit confuses. The SIMD width is constant. Better to call it like ELEMENTS_PER_VECTOR.
There was a problem hiding this comment.
This file is no longer a part of this PR
| namespace x64 = dnnl::impl::cpu::x64; | ||
|
|
||
| template<x64::cpu_isa_t isa, ov::element::Type_t element_type> | ||
| class jit_kernel_traits { |
There was a problem hiding this comment.
Could you add some usage examples?
There was a problem hiding this comment.
This file is no longer a part of this PR
| } | ||
|
|
||
| protected: | ||
| x64::cpu_isa_t max_cpu_isa_; |
There was a problem hiding this comment.
Please don't use such style "name_".
There was a problem hiding this comment.
Please, let me know which style has to be used here. Should I add m_ prefix or just use camel case? I believe you would expect camel case, but in that case there will be no indication of class member which is too unusual to me, so I would like to have your clarification.
|
This PR will be closed in 2 weeks in case of no activity. |
|
This PR was closed because it has been stalled for 2 week with no activity. |
Description
Implementation of jit_kernel_base which makes implementation of all kernels more consistent and allows to share common functionality between kernels.
This PR depends on [CPU] StackAllocator for JIT kernels
Usage