-
Notifications
You must be signed in to change notification settings - Fork 72
Open
Description
Currently it is difficult to verify the current runtime configs for models. It appears you can set configuration either through docker model configure or via a .../_configure/ endpoint, but as far as I can tell there is no way to see what the runtime params are.
I was able to view it with something like this, but it would be more ideal to be able to retrieve this information via cli or the API.
~ ❯ tr '\0' ' ' < /proc/3352229/cmdline ; echo
/app/bin/com.docker.llama-server -ngl 999 --metrics --model /models/bundles/sha256/8d4c5d3f8f32429577d8e9403454a03bf8784f29f600fd09427240a2c4f78c3c/model/model.gguf --host inference-runner-0.sock --ctx-size 4096 --jinjaGiven that things like moe layer placement, flash attention, kv quantization often need to be tweaked for a given set of hardware, it would be nice if this were better exposed.
Metadata
Metadata
Assignees
Labels
No labels