When checking your repo I see benchmarks, but running Leduc CFR example on both CPU and GPU doesn't give anything even close to this performance. On 9950x I get like 30 it/s.
Please provide benchmarking code for cfrx side if example is not full picture (for example doesn't use vmap extensively idk).
MCCFR gives 190 it/s on GPU, 9.4k it/s on CPU
The fact that it works better on CPU is a huge sign of problem :(
GPU is 5090 btw.
When checking your repo I see benchmarks, but running Leduc CFR example on both CPU and GPU doesn't give anything even close to this performance. On 9950x I get like 30 it/s.
Please provide benchmarking code for cfrx side if example is not full picture (for example doesn't use vmap extensively idk).
MCCFR gives 190 it/s on GPU, 9.4k it/s on CPU
The fact that it works better on CPU is a huge sign of problem :(
GPU is 5090 btw.