When attempting to run parallel batch recommendation on CUDA-enabled systems, it fails with a CUDA initialization error in the worker process:
Traceback (most recent call last):
File "/home/MICHAELEKSTRAND/mambaforge/envs/lkimp/lib/python3.10/concurrent/futures/process.py", line 246, in _process_worker
r = call_item.fn(*call_item.args, **call_item.kwargs)
File "/home/MICHAELEKSTRAND/mambaforge/envs/lkimp/lib/python3.10/concurrent/futures/process.py", line 205, in _process_chunk
return [fn(*args) for args in chunk]
File "/home/MICHAELEKSTRAND/mambaforge/envs/lkimp/lib/python3.10/concurrent/futures/process.py", line 205, in <listcomp>
return [fn(*args) for args in chunk]
File "/home/MICHAELEKSTRAND/mambaforge/envs/lkimp/lib/python3.10/site-packages/lenskit/util/parallel.py", line 130, in _mp_invoke_worker
return __work_func(model, *args)
File "/home/MICHAELEKSTRAND/mambaforge/envs/lkimp/lib/python3.10/site-packages/lenskit/batch/_recommend.py", line 19, in _recommend_user
res = algo.recommend(user, n, candidates)
File "/home/MICHAELEKSTRAND/LensKit/lenskit-implicit/lenskit_implicit/implicit.py", line 69, in recommend
recs, scores = self.delegate.recommend(uid, matrix, N=i_n)
File "/home/MICHAELEKSTRAND/mambaforge/envs/lkimp/lib/python3.10/site-packages/implicit/gpu/matrix_factorization_base.py", line 87, in recommend
ids, scores = self.knn.topk(
File "/home/MICHAELEKSTRAND/mambaforge/envs/lkimp/lib/python3.10/site-packages/implicit/gpu/matrix_factorization_base.py", line 122, in knn
self._knn = implicit.gpu.KnnQuery()
File "_cuda.pyx", line 47, in implicit.gpu._cuda.KnnQuery.__cinit__
RuntimeError: cublas error: CUBLAS_STATUS_NOT_INITIALIZED (/tmp/pip-req-build-b0ax806a/implicit/gpu/knn.cu:87)
When attempting to run parallel batch recommendation on CUDA-enabled systems, it fails with a CUDA initialization error in the worker process:
Tagging @benfred in case he has any insight here.
Things to test