Skip to content

Conversation

@ServeurpersoCom
Copy link
Collaborator

Make sure to read the contributing guidelines before submitting a PR

Two KV override working :

(root|~/llama.cpp.pascal) ./build/bin/llama-server --port 8081 \
  --model /var/www/ia/models/mradermacher/gemma-3-1b-it-i1-GGUF/gemma-3-1b-it.i1-Q6_K.gguf \
  --override-kv "tokenizer.ggml.add_bos_token=bool:false,tokenizer.ggml.add_eos_token=bool:false"
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
... etc...
print_info: file size   = 958.64 MiB (8.04 BPW)
validate_override: Using metadata override ( bool) 'tokenizer.ggml.add_bos_token' = false
validate_override: Using metadata override ( bool) 'tokenizer.ggml.add_eos_token' = false
load: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect
... etc...

Help message :

(root|~/llama.cpp.pascal) ./build/bin/llama-server --port 8081 \
  --model /var/www/ia/models/mradermacher/gemma-3-1b-it-i1-GGUF/gemma-3-1b-it.i1-Q6_K.gguf \
  --override-kv "tokenizer.ggml.add_bos_token=bool:false,tokenizer.ggml.add_eos_token=INVALID:blah"
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 5090, compute capability 12.0, VMM: yes
string_parse_kv_override: invalid type for KV override 'tokenizer.ggml.add_eos_token=INVALID:blah'
error while handling argument "--override-kv": error: Invalid type for KV override: tokenizer.ggml.add_eos_token=INVALID:blah


usage:
--override-kv KEY=TYPE:VALUE,...        advanced option to override model metadata by key. use comma-separated
                                        list of overrides.
                                        types: int, float, bool, str. example: --override-kv
                                        tokenizer.ggml.add_bos_token=bool:false,tokenizer.ggml.add_eos_token=bool:false


to show complete usage, run with -h
(root|~/llama.cpp.pascal)

Fixes #18040

@ggerganov
Copy link
Member

I think it's OK, but would like @ngxson to confirm.

Copy link
Collaborator

@ngxson ngxson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we will probably handle the case key=str:a,b where string = "a,b"

but this can be resolved later, there is a chance that no one actually use it

Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
@personalmountains
Copy link
Contributor

Thanks for this PR, but I feel like it just introduces more problems. It will require further changes to escape the comma, it doesn't fix the other options, and it breaks backward compatibility. I can come up with a PR today that allows multiple parameters again, it's already done locally. Would that make sense or am I misunderstanding something?

@ServeurpersoCom
Copy link
Collaborator Author

Thanks for this PR, but I feel like it just introduces more problems. It will require further changes to escape the comma, it doesn't fix the other options, and it breaks backward compatibility. I can come up with a PR today that allows multiple parameters again, it's already done locally. Would that make sense or am I misunderstanding something?

Thanks for the feedback! I understand your concern about backward compatibility.
However, I think the comma-separated approach is cleaner for several reasons:

Consistency: It matches --override-tensor (-ot) which already uses commas
Router compatibility: Multi-argument support doesn't work with the router/preset system (which uses std::map)
Simplicity: Users only need to change a few characters in their startup scripts

That said, I'm open to discussion. If the maintainers prefer keeping backward compatibility with multi-arguments, I can adjust the PR. But IMO, consistency across the codebase is more valuable than preserving a pattern that's broken in router mode anyway.
What do @ggerganov and @ngxson think?

@ngxson
Copy link
Collaborator

ngxson commented Dec 15, 2025

@personalmountains I'm quite against supporting repeated args because as explained in the OP, the environment variable does not accept it.

The arg.cpp and preset.cpp system should be designed to treat CLI arg, ini preset and env var the same way. I think having a dedicated logic to escape the comma won't hurt anyway.

@ngxson
Copy link
Collaborator

ngxson commented Dec 15, 2025

also, the design of presets using std::map was for a reason, some args don't want to be specified twice (which can potentially lead to undefined behavior). it also allow easier merging multiple presets in the future, where one preset can extend from another

so IMO we should not make repeated args a default experience, apparently other config formats like JSON or YAML doesn't want it, then why would we?

@personalmountains
Copy link
Contributor

Fair enough.

@personalmountains
Copy link
Contributor

Sorry, maybe also add a noisy warning when options are discarded? I'm just concerned about the silent backward compatibility break. This can break a model in subtle ways by failing to override certain keys. I found this by chance, really.

Copy link
Member

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @ngxson. It's not ideal that the change is not backwards compatible, but the prospect of improving the new config system is a good tradeoff IMO.

@ServeurpersoCom
Copy link
Collaborator Author

I propose we add a warning directly in the parser for all duplicated arguments. This way everyone gets into good habits, and it allows for progressive migration toward cleaner config patterns ?

@ngxson
Copy link
Collaborator

ngxson commented Dec 15, 2025

Yes that sounds good. We should remove all help messages mentioning about repeated args too, so new users won't follow the same pattern

@ServeurpersoCom
Copy link
Collaborator Author

For --override-kv, comma escaping could be supported for edge cases where a comma appears in the value itself:

diff --git a/common/arg.cpp b/common/arg.cpp
index 002c3c168..1ecad7e9c 100644
--- a/common/arg.cpp
+++ b/common/arg.cpp
@@ -2196,7 +2196,32 @@ common_params_context common_params_parser_init(common_params & params, llama_ex
         "advanced option to override model metadata by key. to specify multiple overrides, either use comma-separated or repeat this argument.\n"
         "types: int, float, bool, str. example: --override-kv tokenizer.ggml.add_bos_token=bool:false,tokenizer.ggml.add_eos_token=bool:false",
         [](common_params & params, const std::string & value) {
-            for (const auto & kv_override : string_split<std::string>(value, ',')) {
+            std::vector<std::string> kv_overrides;
+
+            std::string current;
+            bool escaping = false;
+
+            for (const char c : value) {
+                if (escaping) {
+                    current.push_back(c);
+                    escaping = false;
+                } else if (c == '\\') {
+                    escaping = true;
+                } else if (c == ',') {
+                    kv_overrides.push_back(current);
+                    current.clear();
+                } else {
+                    current.push_back(c);
+                }
+            }
+
+            if (escaping) {
+                current.push_back('\\');
+            }
+
+            kv_overrides.push_back(current);
+
+            for (const auto & kv_override : kv_overrides) {
                 if (!string_parse_kv_override(kv_override.c_str(), params.kv_overrides)) {
                     throw std::runtime_error(string_format("error: Invalid type for KV override: %s\n", kv_override.c_str()));
                 }

@ServeurpersoCom
Copy link
Collaborator Author

Yes that sounds good. We should remove all help messages mentioning about repeated args too, so new users won't follow the same pattern

OK I do it on this PR

@ServeurpersoCom
Copy link
Collaborator Author

ServeurpersoCom commented Dec 15, 2025

It look like :

DEPRECATED: argument '--override-kv' specified multiple times, use comma-separated values instead (only last value will be used)

(root|~/llama.cpp.pascal) ./build/bin/llama-server --port 8081 --model /var/www/ia/models/mradermacher/gemma-3-1b-it-i1-GGUF/gemma-3-1b-it.i1-Q6_K.gguf --override-kv "tokenizer.ggml.add_bos_token=bool:false" --override-kv "tokenizer.ggml.add_eos_token=bool:false"
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 5090, compute capability 12.0, VMM: yes
DEPRECATED: argument '--override-kv' specified multiple times, use comma-separated values instead (only last value will be used)
main: setting n_parallel = 4 and kv_unified = true (add -kvu to disable this)
build: 7435 (1e0bca00b) with GNU 12.2.0 for Linux x86_64
system info: n_threads = 16, n_threads_batch = 16, total_threads = 32

@ServeurpersoCom
Copy link
Collaborator Author

And escaping for --override-kv :

string_parse_kv_override: invalid type for KV override 'tokenizer.ggml.add_eos_token=ESCAP,ING:bl,ah'

(root|~/llama.cpp.pascal) ./build/bin/llama-server --port 8081   --model /var/www/ia/models/mradermacher/gemma-3-1b-it-i1-GGUF/gemma-3-1b-it.i1-Q6_K.gguf   --override-kv "tokenizer.ggml.add_bos_token=bool:false,tokenizer.ggml.add_eos_token=ESCAP\,ING:bl\,ah"
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 5090, compute capability 12.0, VMM: yes
string_parse_kv_override: invalid type for KV override 'tokenizer.ggml.add_eos_token=ESCAP,ING:bl,ah'
error while handling argument "--override-kv": error: Invalid type for KV override: tokenizer.ggml.add_eos_token=ESCAP,ING:bl,ah


usage:
--override-kv KEY=TYPE:VALUE,...        advanced option to override model metadata by key. to specify multiple
                                        overrides, either use comma-separated or repeat this argument.
                                        types: int, float, bool, str. example: --override-kv
                                        tokenizer.ggml.add_bos_token=bool:false,tokenizer.ggml.add_eos_token=bool:false


to show complete usage, run with -h
(root|~/llama.cpp.pascal)

Co-authored-by: personalmountains <46615898+personalmountains@users.noreply.github.com>
@ServeurpersoCom
Copy link
Collaborator Author

ServeurpersoCom commented Dec 15, 2025

Yes that sounds good. We should remove all help messages mentioning about repeated args too, so new users won't follow the same pattern

Should we clean up ALL "can be repeated" help text in this PR, or keep it focused on --override-kv?
The warning is already general (applies to all args), but most other args don't support comma-separated syntax yet. We could:

  • Migrate all args to comma-separated syntax inside this PR (more work but complete solution -> would need a common utility function for escaping)
  • Remove all "can be repeated" mentions now (forces migration even without comma support available yet)
  • Keep this PR focused on --override-kv and migrate other args progressively in follow-up PRs

What do you prefer?

Edit: Turns out there weren't that many, so I migrated them all to eliminate the technical debt.

@ggerganov ggerganov merged commit 487674f into ggml-org:master Dec 17, 2025
67 of 76 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Misc. bug: Repeated command line options are discarded in router mode (--override-kv is broken)

4 participants