The iteration inside the QSpaceLanczos is way faster than the one inside QSpaceHessian, while they speed should match.
This is likely due to the fact that the QSpaceHessian is computing the full Lambda propagator at every iteration. This could be optimized since it is exactly the same for all the q-points calculations, and could be computed once.
Also, probably if we are in mode basis, the Lambda propagator is also diagonal, so no need to build the full matrix.
This will likely compensate the speedup.
Moreover, the Hessian is not parallelized right now.