server: add tool-safe directional steering policy#148
Open
audreyt wants to merge 2 commits into
Open
Conversation
Owner
|
That's brilliant! Thank you so much. Going to merge ASAP. |
Owner
|
@audreyt since I guess you already tested it, are we sure we don't want the final policy to be the default? |
Owner
|
Also, what about an additional |
ccc093e to
7f966fb
Compare
Contributor
Author
Done, thank you for the nudge! I made
I also rebased onto current |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This adds server-side directional steering policies for tool-aware deployments:
--dir-steering-policy final-answer # default --dir-steering-policy decoding --dir-steering-policy always --dir-steering-policy offfinal-answeris now the default policy, following maintainer feedback on antirez/ds4#148. It keeps prompt prefill, thinking tokens, and DSML/tool-call grammar unsteered, then re-enables steering once generation has clearly entered final natural-language answer text.decodingis the middle-ground policy requested in review: prompt/prefill is unsteered, but every generated token is steered, including thinking and tool-call syntax.alwaysrestores the previous always-on behavior, andoffdisables steering at the server policy layer.Why
Directional steering is useful for behavior/style/topic control, but applying it while the model is emitting tool-call syntax can perturb DSML grammar, tool arguments, or Responses/Anthropic tool protocol structure.
For tool-using agents, the safer default is:
This lets deployments use steering for final-answer behavior without making tool calls less reliable.
Changes
ds4-serverdirectional steering policy tofinal-answer.--dir-steering-policy decoding, which disables steering only during prompt/prefill.alwaysavailable for the original behavior andofffor policy-layer disabling.final-answermode.final-answersteering is active, so draft tokens do not cross steering-state boundaries.README.mdanddir-steering/README.md.decoding, and final-answer/tool-safe behavior.Compatibility
The server default changes from
alwaystofinal-answer. Existing deployments that want exact previous behavior can pass:The core steering scales and file format are unchanged.
Testing
The first full-suite attempt used my local abliterated/aligned
ds4flash.ggufsymlink and failed the official logprob-vector fixture, as expected for a different GGUF. Re-running with the non-abliterated imatrix GGUF used for the official vectors passes.