Skip to content

Implementation in CuPy with stream= and dl_device=cpu choices #152

@seberg

Description

@seberg

I implemented dlpack v1 for CuPy (see cupy/cupy#8683), and there are two choices that are important for other implementations and maybe the spec:

  1. We chose to export the cudaManaged device when possible even if dl_device=(CPU, 0) was requested. I.e. we promise that the data can be used on the CPU device, but cupy currently will still give you the actual (compatible) device!
    • Note: NumPy is OK with this in the case of cuda managed memory. But it may not yet be OK with it in the case of future/other similar devices. (I.e. NumPy may need to trust the producer in this case, or we just keep it a bit of a fuzzy thing where we assume the consumer should know the device, possible based on version.)
  2. If user passes dl_device=(CPU, 0), stream=.... We had discussed that the semantics must be related to the device that the data is on, I think. CuPy supports this:
    • stream=None (or nothing passed), will synchronize the device to host copy (i.e. wait until the data is CPU available).
    • stream=consumer_stream will not synchronize. The user could in theory work with the data (e.g. another cudaAsyncCopy) on consumer_stream, or synchronize themselves (e.g. if multiple copies needed).
    • REASON: One reason is that synchronizing in the second case would achieve nothing that stream=None doesn't already achieve. It would effectively do the same stream=None and also synchronize the consumer_stream. (But that stream does not need to be synchronized!)

CC @leofang.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions