Skip to content

cublas backend of MatMul does not work with stream parallelism #618

@roastduck

Description

@roastduck

We should run cublas in an appropriate stream, and this further require to create a different cublas handle for each stream. Since we cache cublas in GPUContext, we should make the cache available for multiple streams.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions