Skip to content

[CPU] Implementation of jit_kernel_base#12991

Closed
lohika-denis-kotov wants to merge 4 commits into
openvinotoolkit:masterfrom
lohika-denis-kotov:feature/jit_kernel_base_class
Closed

[CPU] Implementation of jit_kernel_base#12991
lohika-denis-kotov wants to merge 4 commits into
openvinotoolkit:masterfrom
lohika-denis-kotov:feature/jit_kernel_base_class

Conversation

@lohika-denis-kotov
Copy link
Copy Markdown
Contributor

@lohika-denis-kotov lohika-denis-kotov commented Sep 9, 2022

Description

Implementation of jit_kernel_base which makes implementation of all kernels more consistent and allows to share common functionality between kernels.

This PR depends on [CPU] StackAllocator for JIT kernels

Usage

struct jit_refine_anchors_conf {
    ...
};

struct jit_refine_anchors_call_args {
    ...
};

using jit_refine_anchors_kernel = jit_kernel_base<jit_refine_anchors_conf, jit_refine_anchors_call_args>;

template <x64::cpu_isa_t isa>
class jit_refine_anchors_kernel_fp32 : public jit_refine_anchors_kernel {
 public:
    DECLARE_CPU_JIT_AUX_FUNCTIONS(jit_refine_anchors_kernel_fp32)
    ...
    using Vmm = typename jit_kernel_traits<isa, ov::element::Type_t::f32>::Vmm;
    static constexpr unsigned VCMPPS_LE = jit_kernel_traits<isa, ov::element::Type_t::f32>::VCMPPS_LE;
    static constexpr unsigned VCMPPS_GT = jit_kernel_traits<isa, ov::element::Type_t::f32>::VCMPPS_GT;
    static constexpr unsigned SIMD_WIDTH = jit_kernel_traits<isa, ov::element::Type_t::f32>::SIMD_WIDTH;
    ...
    jit_refine_anchors_kernel_fp32(const jit_refine_anchors_conf &jqp)
        : jit_refine_anchors_kernel(isa, jqp) {}
    ...
    void generate() override;
    ...
};

@lohika-denis-kotov lohika-denis-kotov requested review from a team as code owners September 9, 2022 09:12
@lohika-denis-kotov
Copy link
Copy Markdown
Contributor Author

@dmitry-gorokhov @ceciliapeng2011 Please, take a look at this review and assign reviwers

Comment thread src/plugins/intel_cpu/src/nodes/kernels/jit_kernel_base.hpp Outdated
Comment thread src/plugins/intel_cpu/src/nodes/kernels/jit_kernel_base.hpp Outdated
@lohika-denis-kotov lohika-denis-kotov force-pushed the feature/jit_kernel_base_class branch from b87358f to 62ebd99 Compare September 15, 2022 10:23
@lohika-denis-kotov lohika-denis-kotov changed the title Implementation of jit_kernel_base [CPU] Implementation of jit_kernel_base Sep 15, 2022
@maxnick maxnick added the category: CPU OpenVINO CPU plugin label Sep 16, 2022
@maxnick maxnick self-assigned this Sep 16, 2022
Comment thread src/plugins/intel_cpu/src/nodes/kernels/jit_kernel_base.hpp Outdated
inline void uni_vgatherdps(const Xbyak::Xmm &xmm_val,
const Xbyak::Reg64 &reg_addr,
const Xbyak::Xmm &xmm_index,
const int &scale,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to set default values for scale and disp and do nothing for that values. Also need to check them for acceptable values.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uni_vgatherdps and uni_vscatterdps were excluded from this PR as you requested.

const size_t kDataTypeSize = sizeof(float);
if (is_valid_isa(x64::avx512_core)) {
assert(reg_mask.isOPMASK());
vgatherdps(xmm_val, ptr[reg_addr + xmm_index * scale + disp]);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the mask is not used here?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uni_vgatherdps and uni_vscatterdps were excluded from this PR as you requested.

assert(reg_mask.isYMM());
Xbyak::Ymm ymm_mask{reg_mask.getIdx()};
vgatherdps(xmm_val, ptr[reg_addr + xmm_index * scale + disp], ymm_mask);
} else {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about AVX?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vgatherdps was added since AVX2. For AVX it's emulated the same way as for SSE.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uni_vgatherdps and uni_vscatterdps were excluded from this PR as you requested.

for (int i = 0; i < static_cast<int>(kSimdWidth); i++) {
Xbyak::Label gather_end;
uni_vpextrd(mask, xmm_mask, i);
cmp(mask, 0xFFFFFFFF);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually need to check just most significant bit.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uni_vgatherdps and uni_vscatterdps were excluded from this PR as you requested.

}
}

inline void uni_vscatterdps(const Xbyak::Reg64& reg_addr,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same remarks as for the gather.

Also please move both of this function into PR where they are used.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uni_vgatherdps and uni_vscatterdps were excluded from this PR as you requested.

Comment thread src/plugins/intel_cpu/src/nodes/kernels/registers_pool.hpp
Comment thread src/common/util/include/openvino/util/common_util.hpp Outdated
Comment thread src/plugins/intel_cpu/src/nodes/kernels/jit_kernel_base.hpp Outdated
return x64::is_subset(isa, max_cpu_isa_) && x64::mayiuse(isa);
}

inline void uni_vgatherdps(const Xbyak::Xmm &xmm_val,
Copy link
Copy Markdown
Contributor

@ceciliapeng2011 ceciliapeng2011 Oct 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you think of @dmitry-gorokhov 's suggestion in openvinotoolkit/oneDNN#145 to use jit-emitter instead for gather and scatter?
@lohika-denis-kotov @nshchego

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uni_vgatherdps() is not a part of this PR now

@lohika-denis-kotov lohika-denis-kotov changed the title [CPU] Implementation of jit_kernel_base [WIP][CPU] Implementation of jit_kernel_base Oct 8, 2022
@lohika-denis-kotov lohika-denis-kotov force-pushed the feature/jit_kernel_base_class branch 2 times, most recently from 52309f8 to 7f94f4f Compare October 11, 2022 10:36
@lohika-denis-kotov lohika-denis-kotov force-pushed the feature/jit_kernel_base_class branch 2 times, most recently from 3856b51 to 6e3e679 Compare October 11, 2022 11:52
Comment thread src/plugins/intel_cpu/src/nodes/kernels/jit_base.hpp Outdated
Comment thread src/plugins/intel_cpu/src/nodes/kernels/jit_base.hpp Outdated
@lohika-denis-kotov lohika-denis-kotov force-pushed the feature/jit_kernel_base_class branch from 6e3e679 to cd6d806 Compare October 11, 2022 19:40
@lohika-denis-kotov lohika-denis-kotov changed the title [WIP][CPU] Implementation of jit_kernel_base [CPU] Implementation of jit_kernel_base Oct 11, 2022
@t-jankowski t-jankowski removed request for a team October 12, 2022 08:18
Comment thread src/plugins/intel_cpu/src/nodes/kernels/jit_kernel_base.hpp Outdated
Comment on lines +10 to +11
#include <math.h>
#include <dnnl_extension_utils.h>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need those two header here?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

Comment on lines +10 to +11
#include <math.h>
#include <dnnl_extension_utils.h>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please just check headers that are not used within the PR?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines +182 to +56
RegistersPool::Ptr registers_pool_;
std::unique_ptr<StackAllocator> stack_allocator_;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I remember we discussed placing those two entities on the jit_base level.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, take one more look at this file and confirm that we really need to do this.
This change will additionally require to make stack allocator and registers pool members protected or to create two additional protected methods to create/destroy these objects.

assert(xmm_index.getKind() == xmm_mask.getKind());

std::vector<Xbyak::Reg> not_available_reg{reg_addr};
const Xbyak::Reg64 idx = register_pool()->getInplaceFree<Xbyak::Reg64>(not_available_reg);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is to add factory methods to the jit_base level, so that we can hide the whole register management mechanism to decouple user code from the utility entities.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by decouple user code from the utility entities? You will still need to include registers pool header everywhere you are using register_pool(), no matter if it's located in jit_kernel_base or jit_base. Registers pool lifetime is already encapsulated in jit_kernel_base, so what's the point?

}

template<typename TReg>
inline TReg getInplaceFree(std::vector<Xbyak::Reg>& not_available) const {
Copy link
Copy Markdown
Contributor

@maxnick maxnick Oct 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my perception, RegisterPool should remain the bare minimum of the interface functionality in spite of simplicity, but the interface should be complete (as it is so far). Such utility code should be placed in jit_base.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. But getInplaceFree() is no longer a part of this PR.

namespace ov {
namespace intel_cpu {

using namespace dnnl::impl;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't use 'using namespace' in header files.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

template<x64::cpu_isa_t isa, ov::element::Type_t element_type>
class jit_kernel_traits {
public:
using Vmm = typename conditional3<isa == x64::sse41,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add case for AVX as well

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding using Vmm definition, there are about 90 occurrences of it in our code base. My understanding is that is more convenient to just use something like using Vmm = typename x64::cpu_isa_traits<isa>::Vmm instead.

SIMD_WIDTH is commonly used name/definition in our code base and I believe it's hard to find kernel which doesn't use something like it. However, in this header this name depends on ov::element::Type_t, which makes this definition too specific.

So, I decided to exclude this file from PR.
VCMPPS_LE and VCMPPS_GT constants were moved to jit_base. More constants like these will be added as needed.


static constexpr unsigned VCMPPS_LE = 0x02;
static constexpr unsigned VCMPPS_GT = 0x0e;
static constexpr unsigned SIMD_WIDTH = x64::cpu_isa_traits<isa>::vlen / sizeof(typename ov::element_type_traits<element_type>::value_type);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SIMD_WIDTH little bit confuses. The SIMD width is constant. Better to call it like ELEMENTS_PER_VECTOR.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is no longer a part of this PR

namespace x64 = dnnl::impl::cpu::x64;

template<x64::cpu_isa_t isa, ov::element::Type_t element_type>
class jit_kernel_traits {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add some usage examples?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is no longer a part of this PR

}

protected:
x64::cpu_isa_t max_cpu_isa_;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't use such style "name_".

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, let me know which style has to be used here. Should I add m_ prefix or just use camel case? I believe you would expect camel case, but in that case there will be no indication of class member which is too unusual to me, so I would like to have your clarification.

@akladiev
Copy link
Copy Markdown
Collaborator

This PR will be closed in 2 weeks in case of no activity.

@akladiev
Copy link
Copy Markdown
Collaborator

This PR was closed because it has been stalled for 2 week with no activity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: CPU OpenVINO CPU plugin Stale

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants