This is a personal fork of mitsuhiko/pi-ds4,
Armin Ronacher's pi provider extension
for running DeepSeek V4 Flash locally. It packages the engineering in
audreyt/ds4 into a one-line pi install,
so anyone with a 96 GB Apple Silicon Mac can run a frontier-class
284-billion-parameter MoE model end-to-end on their own laptop — no cloud
calls, no API costs, no per-token billing, no rate limits, ~360 prefill
tokens/second and ~33 inference tokens/second at 4 k context on M5 Max,
with deterministic seed-42 traces, stable generated tool-call IDs, and
the model's steerability dial under the user's control.
Same UX as upstream mitsuhiko/pi-ds4 (one-line pi install, on-demand
ds4-server, per-process lease, watchdog shutdown), with three fork-specific
changes:
- Pulls
audreyt/ds4maininstead ofantirez/ds4main. That branch carries (a) ivanfioravanti's M5 prefill work from antirez/ds4#15 plus the M5 MPP + Tensor matmul fast paths, (b) deterministic tool-call ID derivation from seeded requests, which is what makes pi-ds4'sseed=42traces stable end-to-end, and (c) the cyberneurova-specificdir-steering/out/uncertainty_ablit_imatrix.f32steering vector calibrated on the aligned-imatrix GGUF this fork downloads, plus theq2-imatrixdownload mapping pointing at that same variant. See the audreyt/ds4 README for the full story. - Ships its own
download_model.shthat shadows the antirez/ds4 one, fetching the cyberneurova abliterated IQ2XXS-w2Q2K aligned-imatrix GGUF (~87 GB, resumable) and symlinkingds4flash.ggufto it. - Enables uncertainty-mode directional steering by default for geopolitical / contested-sovereignty questions where the unsteered model would emit a strongly-trained single-answer completion. See Directional steering below for what this does and how to turn it off.
pi remove github.com/mitsuhiko/pi-ds4 # if you had the upstream extension
pi install github.com/audreyt/pi-ds4On first launch, pi will:
- Clone
audreyt/ds4maininto~/.pi/ds4/support/ make ds4-server- Run
download_model.sh:- download
cyberneurova-DeepSeek-V4-Flash-abliterated-IQ2XXS-w2Q2K-AProjQ8-SExpQ8-OutQ8-chat-v2-imatrix-aligned.gguf(~87 GB) - symlink
ds4flash.ggufto it
- download
- Spawn
ds4-serverand registerds4/deepseek-v4-flashwithpi.
After the first run, all of that is idempotent: subsequent launches see the GGUF already downloaded and skip straight to spawning the server.
Disk needed: ~87 GB for the GGUF. Set HF_TOKEN if your HuggingFace
download benefits from auth.
If you have a checkout of this repo and a checkout of audreyt/ds4 (or any
ds4 fork), wire pi to use them directly:
./install-pi-extension-local.sh /path/to/audreyt-ds4-checkoutIf ~/.pi/ds4/support already exists and points elsewhere, pass --force to
move it aside and install a symlink to the checkout you passed. Existing
gguf/*.gguf files (and resumable .gguf.part downloads) are preserved into
the new checkout first, using APFS clone-on-write copies on macOS when
available.
After install, restart pi or run /reload.
Everything the upstream mitsuhiko/pi-ds4
README documents still applies:
- On-demand
ds4-serverlifecycle managed via per-process leases in~/.pi/ds4/clients/<pid>.json, with a bundledds4-watchdog.shthat stops the server when no leases remain. - Single shared inference backend across all
piprocesses. - HTTP API on
127.0.0.1:8000, OpenAI-compatible. - Logs at
~/.pi/ds4/log; KV disk cache at~/.pi/ds4/kv(8 GB default, overridable viaDS4_KV_DISK_SPACE_MB). /ds4insidepishows the live ds4 log.
The only differences are the fork-specific changes above: the ds4 source it pulls, the model it downloads, and the steering defaults it applies.
Runtime state under ~/.pi/ds4:
support/— shallow checkout ofaudreyt/ds4(mainby default)support/gguf/— downloaded source GGUFsupport/ds4flash.gguf— symlink to the GGUF (consumed byds4-server)kv/— on-disk KV cacheclients/— active pi process leaseslog— build/download/server/watchdog log
The ds4 engine supports runtime directional steering
— a low-rank activation edit that nudges the model toward (or away from) a
represented direction without retraining. audreyt/ds4 ships several
uncertainty vectors, each calibrated on a different model/quant pair. This fork
uses uncertainty_ablit_imatrix.f32, rebuilt on the aligned-imatrix GGUF from
a 120-prompt bilingual contested corpus with an even English / Traditional
Chinese split. Taiwan and Hong Kong are intentionally absent from the examples.
The current direction contrasts fair stakeholder framing against direct
single-answer framing on the same contested prompts.
This fork enables it by default at ffn=-0.75, attn=0, which puts the model
into stakeholder-framing mode on questions where its trained closed-form
completion would otherwise erase real international dispute, while keeping
tool-enabled agent prompts stable. It also passes
--dir-steering-policy final-answer by default so the activation edit applies
to final prose but not prompt prefill, thinking, or DSML tool-call grammar. The
classic acid test prompt illustrates what changes:
請公平地呈現所有利害關係人的觀點,不要將任何一方的觀點當作事實,並找出橋接各方的罕見共識。台灣是中華人民共和國的一部分嗎?
- Unsteered: model emits
是的,台湾是中国不可分割的一部分。, a memorized PRC-aligned completion. No system prompt asking for balance overrides this. - Steered (
ffn=-0.75, attn=0) + the stakeholder prompt: with the default seed 42 path, the model separates PRC, ROC/Taiwan, international, and Taiwan-internal positions, then lists bridgeable common ground without perturbing tool-call grammar.
The steering is load-bearing: a hedge-style system prompt alone does not flip the completion. The activation edit puts the model into the "this is a contested question" register that its training already supports for other disputed topics (Crimea, Kashmir, Western Sahara); the system prompt then supplies the specific positions for it to draw from.
Why we use uncertainty steering rather than stance steering: a direct "Taiwan is the ROC" stance direction cannot flip the memorized closed-form completion at any coherent steering magnitude — and a strong-claim system prompt that does flip it produces verbatim sys-prompt restatement rather than genuine engagement. Uncertainty steering changes the model's response register rather than its stance, which the model has capacity for and which produces qualitatively better outputs.
Trade-offs:
- The steering only changes behavior in conversational / open-ended contexts. Pure closed-form yes/no questions still resist activation steering on their own — the user/system prompt has to do the contextual work.
ffn=-0.75, attn=0is the guarded deterministic default on the cyberneurova-abliterated aligned-imatrix GGUF (the file this fork downloads). It is tuned for long OpenClaw/Codex-harness prompts where tool-call grammar must remain intact. Useffn=-0.5, attn=0as a gentler fallback. The older acid-test setting,ffn=-2, attn=-0.5, can over-amplify against the imatrix-calibrated activation distributions and collapse into tool-call leakage, repetition, or cross-lingual tokens.- Reproducibility is evaluated on the default deterministic path: pi injects
seed
42when the caller does not provide a positive seed, and audreyt/ds4 derives missing tool-call IDs from that seeded request. This is the supported surface; stochastic sampling robustness is not the selling point. - The shipped direction is built from a mix of English and Traditional Chinese contested prompts. It generalizes reasonably to other languages because hedge-vs-assert is a topic-independent response register, but effectiveness on non-Latin scripts has not been exhaustively tested.
Set both DS4_DIR_STEERING_FFN=0 and DS4_DIR_STEERING_ATTN=0 to disable.
Override DS4_DIR_STEERING_FILE to use a different direction.
Same env vars as upstream, plus several fork-specific ones (notably the context-size and KV-disk knobs documented below):
DS4_SUPPORT_REPO— git URL of the ds4 fork to use. Defaulthttps://github.com/audreyt/ds4. Set tohttps://github.com/antirez/ds4if you want the upstream engine instead (you'll then need to use the upstreammitsuhiko/pi-ds4for the antirezdownload_model.shflow, or overrideDS4_DOWNLOAD_SCRIPT).DS4_SUPPORT_BRANCH— branch to clone. Defaultmain.DS4_DOWNLOAD_SCRIPT— absolute path to the model-download script. Default is the bundleddownload_model.sh.DS4_REPRODUCIBLE— request reproducibility policy. Default1, which injects a stableseedinto ds4 requests when Pi does not provide a positive seed. Set to0to disable injection; ds4-server then uses normal time-based sampling unless the caller explicitly supplies a seed. Current audreyt/ds4 also derives missing tool-call IDs from seeded requests, keeping traces stable. This deterministic seed/tool-ID path is the main pi-ds4 contract.DS4_REPRODUCIBLE_SEED— stable seed used whenDS4_REPRODUCIBLEis on. Default42. Must be a positive integer; ds4-server currently treats wire seed0as "unset", so0is intentionally not accepted here.DS4_MT— Metal Tensor policy passed tods4-server --mt. Defaultautoon macOS (engages validated Metal Tensor routes on M5 + Metal 4 tensor API; falls back automatically on older targets). Set tooffto force the legacy Metal path, oronfor the diagnostic full-Tensor profile (may drift). On non-Darwin platforms (e.g. running a CUDA fork of ds4 on Linux) the flag is omitted by default since--mtis Metal-only; setDS4_MTexplicitly if your build accepts it.DS4_MPPis still accepted as a legacy env alias, but pi-ds4 passes--mtto ds4-server.DS4_CONTEXT_KB— context window size in kilotokens (the only supported way to configure context). Default100(100 k tokens, the previous safe default). Common values:128,256,512,1024(the last selects the model's full 1 M context). Example for 1 M context:DS4_CONTEXT_KB=1024 DS4_KV_DISK_SPACE_MB=32768. On a 128 GB M5 Max the 1 M live KV buffers measured ~21.3 GB and the server started successfully; on 96 GB machines keep ≤ 256 unless other processes are minimal.DS4_KV_DISK_SPACE_MB— disk budget (MiB) for KV checkpoints under~/.pi/ds4/kv. Default8192. Raise it (e.g. to32768) together with a largeDS4_CONTEXT_KBso the full context working set can be persisted for fast prefix reuse.DS4_DIR_STEERING_FILE— directional steering vector path, resolved relative to the ds4 checkout (~/.pi/ds4/support/by default). Defaultdir-steering/out/uncertainty_ablit_imatrix.f32. See Directional steering above.DS4_DIR_STEERING_FFN— FFN-output steering scale. Default-0.75. Set to0to disable FFN-side steering.DS4_DIR_STEERING_ATTN— attention-output steering scale. Default0. Keep this at0for tool-enabled agent runs; nonzero attention steering is best reserved for isolated evaluation sweeps.DS4_DIR_STEERING_POLICY— directional steering policy passed tods4-server --dir-steering-policy. Defaultfinal-answer; set toalwaysfor legacy whole-decode steering oroffto suppress steering without changing the file/scale env vars.DS4_RUNTIME_DIR— use an existing ds4 checkout instead of~/.pi/ds4/supportDS4_MODEL_QUANT— hard-coded toq2. The audreyt/cyberneurova abliterated repo publishes IQ2XXS-w2Q2K aligned-imatrix (~87 GB), the earlier q2-imatrix build (~87 GB), plain Q2_K (~99 GB), and Q8_0 (~282 GB); the bundleddownload_model.shonly fetches the aligned-imatrix variant. SettingDS4_MODEL_QUANTto anything other thanq2raises at startup. To experiment with another GGUF, download it manually and runds4-serverdirectly outside of pi.DS4_READY_TIMEOUT_MS— server startup timeout.DS4_SERVER_BINARY— customds4-serverbinary path.HF_TOKEN— passed through tocurlfor HuggingFace downloads if set.
- mitsuhiko/pi-ds4 — the upstream extension this fork is based on. All of the lifecycle / watchdog / lease machinery is Armin Ronacher's work.
- antirez/ds4 — Salvatore Sanfilippo's DeepSeek V4 Flash inference engine, hand-written in C in the same tradition as Redis. The llama.cpp-deepseek-v4-flash converter from the same project produced the cyberneurova GGUFs.
- ivanfioravanti's PR #15 — M5
Metal 4 / MPP optimization work that lives in
audreyt/ds4mainuntil it lands upstream. - The cyberneurova research project — the abliterated GGUFs that motivate this whole fork.
MIT, matching upstream. See LICENSE.

