I'm interested in potentially replacing llama-cpp-python with easy-llama in my project, and have some questions about feature parity:
- Are all parameters in the
params dict below available?
https://github.com/oobabooga/text-generation-webui/blob/096272f49e55357a364ed9016357b97829dae0fd/modules/llamacpp_model.py#L88
- Is it possible to get the logits after a certain input? As done here:
https://github.com/oobabooga/text-generation-webui/blob/096272f49e55357a364ed9016357b97829dae0fd/modules/llamacpp_model.py#L134
- Similar to 2. but more nuanced: is there a way to get the logits for every token position in an input at once? In llama-cpp-python, this is done by passing
logits_all=True while loading the model, which reduces performance but makes all logits available as a matrix when you get them with model.eval_logits. I have used this feature to measure the perplexity of llama.cpp quants a while ago using the code here:
https://github.com/oobabooga/text-generation-webui/blob/096272f49e55357a364ed9016357b97829dae0fd/modules/llamacpp_hf.py#L133
- I have a llamacpp_HF wrapper that connects llama.cpp to HF text generation functions; at its core, all it does is update
model.n_tokens to do prefix matching, and evaluate new tokens by calling model.eval taking as input a list containing the new tokens only. Can that be done with easy-llama? See:
https://github.com/oobabooga/text-generation-webui/blob/096272f49e55357a364ed9016357b97829dae0fd/modules/llamacpp_hf.py#L118
- Is speculative decoding implemented? There is a PR here https://github.com/oobabooga/text-generation-webui/pull/6669/files to add it, and having it in easy-llama would be great, especially if it could be done in a simple way by just passing new kwargs to its model loading and/or generation functions. I believe doing that for my llamacpp_HF wrapper would be very hard, so that's not something I have hopes for.
If you are interested, a PR changing llama-cpp-python to easy-llama in my repository would be highly welcome once wheels are available. It would be a way to test the library as well. But I can also to try to do the change myself.
I'm interested in potentially replacing llama-cpp-python with easy-llama in my project, and have some questions about feature parity:
paramsdict below available?https://github.com/oobabooga/text-generation-webui/blob/096272f49e55357a364ed9016357b97829dae0fd/modules/llamacpp_model.py#L88
https://github.com/oobabooga/text-generation-webui/blob/096272f49e55357a364ed9016357b97829dae0fd/modules/llamacpp_model.py#L134
logits_all=Truewhile loading the model, which reduces performance but makes all logits available as a matrix when you get them withmodel.eval_logits. I have used this feature to measure the perplexity of llama.cpp quants a while ago using the code here:https://github.com/oobabooga/text-generation-webui/blob/096272f49e55357a364ed9016357b97829dae0fd/modules/llamacpp_hf.py#L133
model.n_tokensto do prefix matching, and evaluate new tokens by callingmodel.evaltaking as input a list containing the new tokens only. Can that be done with easy-llama? See:https://github.com/oobabooga/text-generation-webui/blob/096272f49e55357a364ed9016357b97829dae0fd/modules/llamacpp_hf.py#L118
If you are interested, a PR changing llama-cpp-python to easy-llama in my repository would be highly welcome once wheels are available. It would be a way to test the library as well. But I can also to try to do the change myself.