Opening for a thread of discussion - and willing to make the changes if interested. I'm testing the cl module on the following systems:
- Linux system running a pair of Nvidia Titan Xp GPUs with a 12-core Intel CPU + Nvidia's OpenCL drivers installed
- MacPro with an AMD Radeon RX580 with 2x12-core Intel CPUs + macOS 10.14.3 with native Apple OpenCL drivers installed
- MacBook Pro with 2xAMD GPUs, 4-core Intel CPU
While the Macs are executing OpenCL functions in sub-ms times, the Nvidia drivers are relatively slow when performing allocations of memory objects against the GPU, in some cases taking on the order of 100's of ms to complete (while actual operations on said allocated objects are damn snappy).
Since "dirty schedulers" were introduced ~Erlang 17.3 and compiled in as of ERTS 9.0, I was wondering if you'd be open to updating the NIF exports to execute all the cl NIF functions on the dirty schedulers if the ERTS being compiled for supports them.
BTW, really incredible undertaking here - before I found your repo I was building out my own OpenCL NIF so much respect for the fact you completed a full implementation!
Opening for a thread of discussion - and willing to make the changes if interested. I'm testing the
clmodule on the following systems:While the Macs are executing OpenCL functions in sub-ms times, the Nvidia drivers are relatively slow when performing allocations of memory objects against the GPU, in some cases taking on the order of 100's of ms to complete (while actual operations on said allocated objects are damn snappy).
Since "dirty schedulers" were introduced ~Erlang 17.3 and compiled in as of ERTS 9.0, I was wondering if you'd be open to updating the NIF exports to execute all the
clNIF functions on the dirty schedulers if the ERTS being compiled for supports them.BTW, really incredible undertaking here - before I found your repo I was building out my own OpenCL NIF so much respect for the fact you completed a full implementation!