-
-
Notifications
You must be signed in to change notification settings - Fork 145
Description
Let me preface this issue by saying that I have known this repo for a long time but never had the opportunity/need to play with it until recently. Now that I had the chance to I want to thank you for this marvelous piece of tech that is as impressive as I thought it was back in the day :)
Rationale
My need is really specific but I think it is a growing one as we see people playing more and more with AI model. Basically I'm working with an onnx model doing image detection. To speed up this process, especially the preparation of the input (as it need to be in a specific format/normalized rgb values) I thought of using ComputeSharp and it works great.
As I try to optimize the process further, I want to remove the copy I need to do from the gpu to the cpu (and cpu to the gpu again for the tensor). My model can run on gpu and the onnx runtime already allow the creation of tensor value on the gpu. Exposing the underlying pointer/device of Buffer could allow to mix multiple solutions that execute on gpu by sharing their memories, removing the round trip needed on the cpu to share the output of one as the input of the other.
Proposed API
If I understand things correctly, something like this should do the trick?
public class Buffer<T>
{
public int DeviceId { get; }
public IntPtr MemoryAddress { get; }
}
I'm wondering if those things are not already hidden behind the allocation or d3D12Resource fields.
Drawbacks
People could shoot themselves in foot beautifully if not used properly.
Alternatives
I guess I could try to play with reflection extract those private fields and see if I could use them.
Other thoughts
Again thank you for this project now that I've used it I'm starting to see application everywhere ^^"