Current optimization logs show that the GPUs are idle 70% of the time GEAK runs, in dealing with LLM calls and CPU-based work. Also, the number of parallel agents GEAK can call is currently tied to the number of available GPUs. Implement a GPU scheduler that shares GPUs among parallel runs, so that more parallel agents can run simultaneously, as well as share the GPU resources more efficiently.
Current optimization logs show that the GPUs are idle 70% of the time GEAK runs, in dealing with LLM calls and CPU-based work. Also, the number of parallel agents GEAK can call is currently tied to the number of available GPUs. Implement a GPU scheduler that shares GPUs among parallel runs, so that more parallel agents can run simultaneously, as well as share the GPU resources more efficiently.