GC-GPU integration by dchigarev · Pull Request #169 · slyalin/openvino

dchigarev · 2024-09-06T12:27:31Z

This PR adds an integration with graph-compiler's GPU pipeline. The integration means passing GPU buffers (usm ptr/cl_mem), cl queue and cl events as is to GC for execution.

A set of sanity tests was also added to test the integration.

How to build and run tests

Build LLVM with IMEX patches:

git clone https://github.com/intel/graph-compiler.git
./scripts/compile.sh --dev --llvm --imex
export LLVM_INST_PATH=$(pwd)/externals/llvm-project/build

Build OV from this branch:

git clone https://github.com/dchigarev/openvino.git
cd openvino & git checkout gc-gpu
mkdir build & cd build
cmake .. -G Ninja \
	-DLLVM_DIR=$LLVM_INST_PATH/lib/cmake/llvm \
	-DMLIR_DIR=$LLVM_INST_PATH/lib/cmake/mlir \
	-DENABLE_GRAPH_COMPILER=ON \
	-DENABLE_INTEL_GPU=ON \ # <-- enables GPU capabilities of graph compiler
	-DENABLE_TESTS=ON

Run sanity tests:

OV_MLIR_MODE=GC_GPU ./bin/intel64/Release/ov_gpu_func_tests --gtest_filter=MLIRExecution.*

Run benchmark_app:

OV_MLIR_MODE=GC_GPU ./bin/intel64/Debug/benchmark_app -m ./src/plugins/intel_gpu/tests/functional/mlir_op/models/matmul_64_128_f16.xml -d GPU -use_device_mem -ip f16 -infer_precision f16 -niter 100 -hint none -nstreams 1 -nthreads 1

What was changed and how it works

1. Common `MLIREvaluate` class was split into two

There are now two classes: MLIREvaluate (generic evaluation) and MLIREvaluateGcGPU. They both implement the interface of MLIREvaluateBase and an actual instance is created based on mlir_mode parameter in MLIREvaluateBase::create().

This was done because these two evaluation classes operate with different objects in order to lower and invoke recieved MLIR module. Generic MLIREvaluate operates on mlir::EvaluationEngine and mlir::Module, while MLIREvaluateGcGPU operates with gc-specific runtime objects (mlir::gc::OclModuleBuilder, mlir::gc::OclModule, etc...).

2. Context/device information is now forwarded to `MLIREvaluateBase::create()`

We need context + device information for the gc-gpu-runtime in order to build a module. That's why we now extract ocl_context and cl_device_id from RemoteContextImpl in TransformationsPipeline and forward it all the way to MLIREvaluateBase::create() using ov::EvaluationContext map.

3. Separation between `MLIREvaluate::invoke` and `MLIREvaluate::invoke_packed`

A new invocation method was added to MLIREvaluateBase interface (::invoke()). In comparison with ::invoke_packed() that accepts memref arguments in the MemrefDescriptor format, ::invoke() takes tensor vectors as is.

GC-GPU runtime expects arguments to be in a non-packed format (pointers only) if all memrefs in the compiled mlir module have static shapes. Otherwise it expects "packed" format (MemrefDescriptors).

A query method was added to determine which method of MLIREvaluate to call.

(@AndreyPavlenko may provide more info on why we need this separation)

4. Actual OCL implementations of `cldnn::stream/buffer/event` are now exposed to `intel_gpu/src/plugin/ops/mlir_op.cpp`

Base classes of stream/buffer/event do not have a method to get a handle of an actual underlying object (cl_queue/cl_mem/cl_event). In order to obtain these handles and pass them to gc-gpu runtime, the instances of these abstract objects are being dynamic-casted to their presumable implementations (ocl::gpu_buffer / ocl_stream / ocl_base_event). In order to do that we have to expose declaration of these ocl-specific implementations to ops/mlir_op.cpp by modifying its include directories. Are we okay with this?

^--- this was replaced with the one below

4. `cldnn::stream/buffer/event/device` are now able to return an underlying ocl handle

In order to get an actual cl object and pass it to the graph compiler's GPU runtime, the void* get_handle() method was added to cldnn::stream/buffer/device/event interfaces. The method is supposed to return a pointer to an opencl handle from C-api (cl_mem, cl_command_queue, cl_device_id, ...) since gc-gpu runtime takes these instead of c++ wrappers.

5. `cldnn::stream::create_base_event(...)` can now take a pointer to `cl_event`

(in order to propagate cl_event returned from gc-gpu runtime to cldnn::event that is returned from MLIROp::evaluate())

Signed-off-by: dchigarev <dmitry.chigarev@intel.com>

Co-authored-by: Andrey Pavlenko <andrey.a.pavlenko@gmail.com> Signed-off-by: dchigarev <dmitry.chigarev@intel.com>

dchigarev · 2024-09-06T12:52:19Z

+          shape(module_input_shape.begin(), module_input_shape.end()) {
+        if (shape.size() != tensor.get_shape().size()) {
+            // validate that the shape difference is due to trailing '1's
+            for (size_t i = 0; i < shape.size(); ++i) {
+                if (shape[i] != tensor.get_shape()[i]) {
+                    OPENVINO_THROW("Mismatch in shape sizes");
+                }
+            }
+            for (size_t i = shape.size(); i < tensor.get_shape().size(); ++i) {
+                if (tensor.get_shape()[i] != 1) {
+                    OPENVINO_THROW("Mismatch in shape sizes");
+                }
+            }
+        }
+        strides.resize(shape.size());


This is needed due to the fact that GPU memory formats could hold at least 4-dimensions, causing trailing extra dims (<64x128x1x1> instead of <64x128>). This code compares input tensors' dimensions with the input dimensions of a MLIR module and trims extra dims.

dchigarev · 2024-09-06T13:59:01Z

@@ -38,21 +51,71 @@ void CreateMLIRSubgraphOp(ProgramBuilder& p, const std::shared_ptr<ov::op::mlir:



We probably don't need this synchronization anymore since we pass the same queue to GC and submit our kernels to it.

Signed-off-by: dchigarev <dmitry.chigarev@intel.com>

dchigarev · 2024-09-30T09:13:16Z


-void TRANSFORMATIONS_API transformMLIR(std::shared_ptr<ov::Model> model);
+void TRANSFORMATIONS_API transformMLIR(std::shared_ptr<ov::Model> model,
+                                       std::shared_ptr<ov::EvaluationContext> loweringContext);


loweringContext stores ocl_context for mlir_op::gpu

Signed-off-by: dchigarev <dmitry.chigarev@intel.com>

dchigarev · 2024-10-01T10:58:48Z


-    OpenCL(bool out_of_order_queue = true)
-    {
+    OpenCL(bool out_of_order_queue = true) {


this was fixed by openvino's linter

dchigarev · 2024-10-01T11:01:22Z

@@ -23,8 +23,7 @@ struct OpenCL {
    bool _supports_usm;


moved this class from tests/unit_tests/utils to tests/common/utils in order to reuse it in sanity tests for the GPU integration

dchigarev · 2024-10-01T11:03:48Z

        _queue = cl::CommandQueue(_context, _device, props);
    }

+    OpenCL(cl_context context, bool out_of_order_queue = true)


it's more convenient to construct this object from cl_context in sanity tests for GPU integration, since we can simply request the context from compiled model and construct this class

dchigarev · 2024-10-01T12:26:49Z

@vladimir-paramuzov @slyalin @kurapov-peter @AndreyPavlenko

I think this PR is now in a state where it can be reviewed

Signed-off-by: dchigarev <dmitry.chigarev@intel.com>

slyalin · 2024-10-07T08:08:01Z

Should we merge #167 before merging this PR? Are both PRs ready to be merged? If they don't have obvious breaking changes, it is more convenient to continue development in the main mlir branch. I have a merged version of mlir branch and master branch from main openvino repository. So to avoid you fighting with merge conflicts on your side I would recommend to merge mentioned two PRs and then I redo the merge with master openvino branch on my side. @kurapov-peter, @AndreyPavlenko, @dchigarev?

kurapov-peter · 2024-10-07T10:18:12Z

#167 isn't ready. It still contains experimental code that needs to be cleaned up and points to a fork. @niuxiaog, could you please prepare it for the merge?

Signed-off-by: dchigarev <dmitry.chigarev@intel.com>

dchigarev · 2024-10-08T14:23:03Z

Are both PRs ready to be merged? If they don't have obvious breaking changes, it is more convenient to continue development in the main mlir branch.

I think this PR is already in a state where it can be merged. There's one more question though regarding gpu-runtime headers exposure that I would like to discuss.

The questions is, whether it's okay that we include GPU runtime headers to the openvino_intel_gpu_plugin target to access the definitions of OCL-specific engine/buffer/stream implementations and extract the actual OCL handles from them? We also may need these headers in transformation_pipeline.cpp in order to extract a device id from the context. If this headers exposure is not okay, what other alternatives we have in order to extract OCL handles? @vladimir-paramuzov

Signed-off-by: dchigarev <dmitry.chigarev@intel.com>

kurapov-peter

Looks good to me. Would TPP need anything from evaluation context btw?

Signed-off-by: dchigarev <dmitry.chigarev@intel.com>

slyalin · 2024-10-14T13:48:12Z

@vladimir-paramuzov, please approve explicitly and we will merge.

Signed-off-by: dchigarev <dmitry.chigarev@intel.com>

dchigarev · 2024-10-15T10:55:40Z

@slyalin I believe we've got all approves we needed

Bumps [alabaster](https://github.com/sphinx-doc/alabaster) from 0.7.14 to 0.7.16. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/sphinx-doc/alabaster/releases">alabaster's releases</a>.</em></p> <blockquote> <h2>Alabaster 0.7.16</h2> <p>Changelog: <a href="https://alabaster.readthedocs.io/en/latest/changelog.html">https://alabaster.readthedocs.io/en/latest/changelog.html</a></p> <h2>Alabaster 0.7.15</h2> <p>Changelog: <a href="https://alabaster.readthedocs.io/en/latest/changelog.html">https://alabaster.readthedocs.io/en/latest/changelog.html</a></p> </blockquote> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/sphinx-doc/alabaster/blob/master/docs/changelog.rst">alabaster's changelog</a>.</em></p> <blockquote> <h2>:git_tag:<code>0.7.16</code> -- 2024-01-10</h2> <ul> <li>:bug:<code>215</code> Do not display <code>logo_name</code> if it is set to <code>False</code>.</li> </ul> <h2>:git_tag:<code>0.7.15</code> -- 2024-01-08</h2> <ul> <li>:feature:<code>213</code> Allow an arbitrary string in the <code>logo_name</code> option.</li> <li>:feature:<code>114</code> Improved sidebar CSS styles.</li> <li>:issue:<code>178</code> Deprecated <code>canonical_url</code> in favor of <code>html_baseurl</code>.</li> <li>:bug:<code>200</code> Removed duplicate <code><meta name="viewport" ... /></code> tag.</li> <li>:bug:<code>188</code> Removed underline from whitespace.</li> <li>:bug:<code>164</code> Removed <code>type="text/javascript"</code> from  elements.</li> <li>:bug:<code>161</code> Replaced <code>&copy;</code> with unicode decimal code entity <code>[slyalin#169](https://github.com/sphinx-doc/alabaster/issues/169);</code>.</li> </ul> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/sphinx-doc/alabaster/commit/f3fdc049e4b531595bc075718a66677fbca851e3"><code>f3fdc04</code></a> Bump to 0.7.16</li> <li><a href="https://github.com/sphinx-doc/alabaster/commit/39cbbc1b614876118ce1d3209b04ea0c642033b5"><code>39cbbc1</code></a> Do not display logo_name == 'false'</li> <li><a href="https://github.com/sphinx-doc/alabaster/commit/d24c4cba335cf2215600b41589d9e30a4eb96a76"><code>d24c4cb</code></a> Bump to 0.7.15</li> <li><a href="https://github.com/sphinx-doc/alabaster/commit/56f12de4cb6d35781fc9073226299ab90e567ab5"><code>56f12de</code></a> Include documentation in the sdist</li> <li><a href="https://github.com/sphinx-doc/alabaster/commit/2d8d0382077c342b1670cf72a8da70b9ab04bb5b"><code>2d8d038</code></a> Adorn the LICENSE file with a suffix</li> <li><a href="https://github.com/sphinx-doc/alabaster/commit/a31816c198dd3562a4f8a2eb22e7758a1e121ceb"><code>a31816c</code></a> Remove obscenities (<a href="https://redirect.github.com/sphinx-doc/alabaster/issues/173">#173</a>)</li> <li><a href="https://github.com/sphinx-doc/alabaster/commit/166d6e70ce9d52dfb849e51467cb5a23e9595f01"><code>166d6e7</code></a> Replace © with unicode decimal code entity (<a href="https://redirect.github.com/sphinx-doc/alabaster/issues/161">#161</a>)</li> <li><a href="https://github.com/sphinx-doc/alabaster/commit/b9e8d798dfb8f94a07bd96883cc8bd6d60b488cb"><code>b9e8d79</code></a> Deprecate canonical_url in favor of html_baseurl (<a href="https://redirect.github.com/sphinx-doc/alabaster/issues/178">#178</a>)</li> <li><a href="https://github.com/sphinx-doc/alabaster/commit/848e71871b0da4fbb73cee84157f169641a7c9b6"><code>848e718</code></a> Allow an arbitrary string in the logo_name option (<a href="https://redirect.github.com/sphinx-doc/alabaster/issues/213">#213</a>)</li> <li><a href="https://github.com/sphinx-doc/alabaster/commit/6922a16732eab173c1fec74dd8e903f8ad0f4222"><code>6922a16</code></a> Improve sidebar styles</li> <li>Additional commits viewable in <a href="https://github.com/sphinx-doc/alabaster/compare/0.7.14...0.7.16">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=alabaster&package-manager=pip&previous-version=0.7.14&new-version=0.7.16)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Alina Kladieva <alina.kladieva@intel.com>

dchigarev and others added 3 commits September 6, 2024 09:25

Initial gc-gpu integration

04a2dc0

Signed-off-by: dchigarev <dmitry.chigarev@intel.com>

disable gpu mode bu default

a0f683b

Signed-off-by: dchigarev <dmitry.chigarev@intel.com>

Draft of forwarding cl queue

4b2b653

Co-authored-by: Andrey Pavlenko <andrey.a.pavlenko@gmail.com> Signed-off-by: dchigarev <dmitry.chigarev@intel.com>

github-actions Bot added category: build category: transformations category: GPU labels Sep 6, 2024

dchigarev commented Sep 6, 2024

View reviewed changes

Comment thread src/plugins/intel_gpu/src/plugin/ops/mlir_op.cpp Outdated

dchigarev commented Sep 6, 2024

View reviewed changes

Comment thread src/plugins/intel_gpu/src/plugin/ops/mlir_op.cpp

dchigarev commented Sep 6, 2024

View reviewed changes

dchigarev mentioned this pull request Sep 6, 2024

OV GPU integration intel/graph-compiler#207

Closed

dchigarev added 5 commits September 11, 2024 12:26

Add tests with cl buffers

7319299

Signed-off-by: dchigarev <dmitry.chigarev@intel.com>

add f16 tests to support dpas

118615f

Signed-off-by: dchigarev <dmitry.chigarev@intel.com>

allign with new gc

ee502bb

Signed-off-by: dchigarev <dmitry.chigarev@intel.com>

Align integration with new GC runtime

a01a222

Signed-off-by: dchigarev <dmitry.chigarev@intel.com>

Forward device information to mlir_op at model::compile() time

d41b867

Signed-off-by: dchigarev <dmitry.chigarev@intel.com>

github-actions Bot added category: Core category: CPP API category: CPU category: inference labels Sep 30, 2024

dchigarev commented Sep 30, 2024

View reviewed changes

dchigarev mentioned this pull request Sep 30, 2024

Implemented GPU OpenCL runtime intel/graph-compiler#343

Merged

do not 'wait()' before 'mlir::gpu_op'

de413d4

Signed-off-by: dchigarev <dmitry.chigarev@intel.com>

dchigarev commented Oct 1, 2024

View reviewed changes

AndreyPavlenko reviewed Oct 1, 2024

View reviewed changes

Comment thread cmake/graph-compiler.cmake Outdated

dchigarev marked this pull request as ready for review October 1, 2024 12:21

fix naming and put few 'vector::reserve()'

e86c699

Signed-off-by: dchigarev <dmitry.chigarev@intel.com>

AndreyPavlenko reviewed Oct 1, 2024

View reviewed changes

Comment thread cmake/graph-compiler.cmake Outdated

AndreyPavlenko reviewed Oct 1, 2024

View reviewed changes

Comment thread src/common/transformations/src/transformations/mlir/mlir_op.hpp

dchigarev added 2 commits October 8, 2024 09:57

return cl_event to OV properly

abe86dd

Signed-off-by: dchigarev <dmitry.chigarev@intel.com>

pass tensor vectors as is to MLIREvaluate::invoke()'

42cbdc0

Signed-off-by: dchigarev <dmitry.chigarev@intel.com>

AndreyPavlenko reviewed Oct 8, 2024

View reviewed changes

Comment thread src/common/transformations/src/transformations/mlir/mlir_op.cpp Outdated

address review comments

717f7ca

Signed-off-by: dchigarev <dmitry.chigarev@intel.com>

vladimir-paramuzov reviewed Oct 9, 2024

View reviewed changes

Comment thread src/inference/include/openvino/runtime/intel_gpu/remote_properties.hpp Outdated

Comment thread src/plugins/intel_gpu/CMakeLists.txt Outdated

dchigarev added 3 commits October 9, 2024 09:41

move mlir-related properties to dev_api

8c890d6

Signed-off-by: dchigarev <dmitry.chigarev@intel.com>

return handles from ocl impls

1e05af2

Signed-off-by: dchigarev <dmitry.chigarev@intel.com>

fix graph-compiler.cmake

901f2e3

Signed-off-by: dchigarev <dmitry.chigarev@intel.com>

github-actions Bot removed the category: CPP API label Oct 9, 2024

dchigarev added 2 commits October 10, 2024 08:33

fix cmake

a0981ac

Signed-off-by: dchigarev <dmitry.chigarev@intel.com>

create event from ocl handle

7b244ca

Signed-off-by: dchigarev <dmitry.chigarev@intel.com>

kurapov-peter approved these changes Oct 11, 2024

View reviewed changes

dchigarev requested review from AndreyPavlenko and vladimir-paramuzov October 11, 2024 14:57

AndreyPavlenko reviewed Oct 11, 2024

View reviewed changes

Comment thread cmake/graph-compiler.cmake

AndreyPavlenko reviewed Oct 11, 2024

View reviewed changes

Comment thread cmake/graph-compiler.cmake Outdated

AndreyPavlenko reviewed Oct 11, 2024

View reviewed changes

Comment thread src/common/transformations/src/transformations/mlir/mlir_op.cpp Outdated

AndreyPavlenko approved these changes Oct 11, 2024

View reviewed changes

apply review suggestions

5c2284f

Signed-off-by: dchigarev <dmitry.chigarev@intel.com>

vladimir-paramuzov reviewed Oct 15, 2024

View reviewed changes

Comment thread src/plugins/intel_gpu/src/plugin/transformations_pipeline.cpp Outdated

assume there's one device per cl_context

f0ecd16

Signed-off-by: dchigarev <dmitry.chigarev@intel.com>

vladimir-paramuzov approved these changes Oct 15, 2024

View reviewed changes

slyalin merged commit 6d28d16 into slyalin:mlir Oct 15, 2024

		@@ -38,21 +51,71 @@ void CreateMLIRSubgraphOp(ProgramBuilder& p, const std::shared_ptr<ov::op::mlir:

Conversation

dchigarev commented Sep 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What was changed and how it works

1. Common MLIREvaluate class was split into two

2. Context/device information is now forwarded to MLIREvaluateBase::create()

3. Separation between MLIREvaluate::invoke and MLIREvaluate::invoke_packed

4. Actual OCL implementations of cldnn::stream/buffer/event are now exposed to intel_gpu/src/plugin/ops/mlir_op.cpp

4. cldnn::stream/buffer/event/device are now able to return an underlying ocl handle

5. cldnn::stream::create_base_event(...) can now take a pointer to cl_event

Uh oh!

Uh oh!

Uh oh!

dchigarev Sep 6, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dchigarev Sep 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dchigarev Sep 30, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dchigarev Oct 1, 2024

Choose a reason for hiding this comment

Uh oh!

dchigarev Oct 1, 2024

Choose a reason for hiding this comment

Uh oh!

dchigarev Oct 1, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

dchigarev commented Oct 1, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

slyalin commented Oct 7, 2024

Uh oh!

kurapov-peter commented Oct 7, 2024

Uh oh!

Uh oh!

dchigarev commented Oct 8, 2024

Uh oh!

Uh oh!

Uh oh!

kurapov-peter left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

slyalin commented Oct 14, 2024

Uh oh!

Uh oh!

dchigarev commented Oct 15, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

dchigarev commented Sep 6, 2024 •

edited

Loading

1. Common `MLIREvaluate` class was split into two

2. Context/device information is now forwarded to `MLIREvaluateBase::create()`

3. Separation between `MLIREvaluate::invoke` and `MLIREvaluate::invoke_packed`

4. Actual OCL implementations of `cldnn::stream/buffer/event` are now exposed to `intel_gpu/src/plugin/ops/mlir_op.cpp`

4. `cldnn::stream/buffer/event/device` are now able to return an underlying ocl handle

5. `cldnn::stream::create_base_event(...)` can now take a pointer to `cl_event`

dchigarev Sep 6, 2024 •

edited

Loading