Is it possible to use this in llama 2? I'm interested in improving the inference speed so the accuracy loss doesn't matter right now
Is it possible to use this in llama 2? I'm interested in improving the inference speed so the accuracy loss doesn't matter right now