CUDA: Add fastdiv to k_bin_bcast*, giving 1-3% E2E performance (#…
#96
| Job | Run time |
|---|---|
| 1m 56s | |
| 1m 56s |
fastdiv to k_bin_bcast*, giving 1-3% E2E performance (#…
#96
| Job | Run time |
|---|---|
| 1m 56s | |
| 1m 56s |