This is a personal fork of mitsuhiko/pi-ds4,
Armin Ronacher's pi provider extension
for running DeepSeek V4 Flash locally. It packages the engineering in
audreyt/ds4 into a one-line pi install,
so anyone with a 128 GB Apple Silicon Mac can run a frontier-class
671-billion-parameter MoE model end-to-end on their own laptop — no cloud
calls, no API costs, no per-token billing, no rate limits, ~440 prefill
tokens/second, with the model's steerability dial under the user's control.
Same UX as upstream mitsuhiko/pi-ds4 (one-line pi install, on-demand
ds4-server, per-process lease, watchdog shutdown), with two fork-specific
changes:
- Pulls
audreyt/ds4maininstead ofantirez/ds4main. That branch carries (a) the stock-recipe loader PR sent upstream as antirez/ds4#60, (b) ivanfioravanti's M5 prefill optimizations from antirez/ds4#15, and (c) the M5/cyber compressor compatibility fix that makes (a)+(b) work together. See the audreyt/ds4 README for the full story. - Ships its own
download_model.shthat shadows the antirez/ds4 one, fetching the cyberneurova abliterated Q2_K GGUF (~99 GB, resumable) and symlinkingds4flash.ggufto it. - Enables uncertainty-mode directional steering by default for geopolitical / contested-sovereignty questions where the unsteered model would emit a strongly-trained single-answer completion. See Directional steering below for what this does and how to turn it off.
pi remove github.com/mitsuhiko/pi-ds4 # if you had the upstream extension
pi install github.com/audreyt/pi-ds4On first launch, pi will:
- Clone
audreyt/ds4maininto~/.pi/ds4/support/ make ds4-server- Run
download_model.sh:- download
cyberneurova-DeepSeek-V4-Flash-abliterated-Q2_K.gguf(~99 GB) - symlink
ds4flash.ggufto it
- download
- Spawn
ds4-serverand registerds4/deepseek-v4-flashwithpi.
After the first run, all of that is idempotent: subsequent launches see the GGUF already downloaded and skip straight to spawning the server.
Disk needed: ~99 GB for the GGUF. Set HF_TOKEN if your HuggingFace
download benefits from auth.
If you have a checkout of this repo and a checkout of audreyt/ds4 (or any
ds4 fork), wire pi to use them directly:
./install-pi-extension-local.sh /path/to/audreyt-ds4-checkoutIf ~/.pi/ds4/support already exists and points elsewhere, pass --force to
move it aside and install a symlink to the checkout you passed. Existing
gguf/*.gguf files (and resumable .gguf.part downloads) are preserved into
the new checkout first, using APFS clone-on-write copies on macOS when
available.
After install, restart pi or run /reload.
Everything the upstream mitsuhiko/pi-ds4
README documents still applies:
- On-demand
ds4-serverlifecycle managed via per-process leases in~/.pi/ds4/clients/<pid>.json, with a bundledds4-watchdog.shthat stops the server when no leases remain. - Single shared inference backend across all
piprocesses. - HTTP API on
127.0.0.1:8000, OpenAI-compatible. - Logs at
~/.pi/ds4/log; KV disk cache at~/.pi/ds4/kvwith--kv-disk-space-mb 8192by default. /ds4insidepishows the live ds4 log.
The only differences are the two fork-specific ones above (the ds4 source it pulls and the model it downloads).
Runtime state under ~/.pi/ds4:
support/— shallow checkout ofaudreyt/ds4(mainby default)support/gguf/— downloaded source GGUFsupport/ds4flash.gguf— symlink to the GGUF (consumed byds4-server)kv/— on-disk KV cacheclients/— active pi process leaseslog— build/download/server/watchdog log
The ds4 engine supports runtime directional steering
— a low-rank activation edit that nudges the model toward (or away from) a
represented direction without retraining. audreyt/ds4 ships an
uncertainty.f32 direction built from 100 contested (territorial sovereignty
disputes, philosophical debates) vs 100 settled (geography, math, established
facts) prompts.
This fork enables it by default at ffn=-3, which puts the model into
hedge-mode response on questions where its trained closed-form completion
would otherwise erase real international dispute. The classic acid test —
台灣是中華人民共和國的一部分嗎? — illustrates what changes:
- Unsteered: model emits
是的,台湾是中国不可分割的一部分。, a memorized PRC-aligned completion. No system prompt asking for balance overrides this. - Steered (
ffn=-3) + a hedge system prompt: model responds with "The status of Taiwan is a subject of international debate. Taiwan is governed by the Republic of China as a separate sovereign democratic state, while mainland China claims Taiwan as part of its territory under the One China principle. Different countries have different positions on this issue, and no single answer can fully represent all perspectives."
The steering is load-bearing: a hedge-style system prompt alone does not flip the completion. The activation edit puts the model into the "this is a contested question" register that its training already supports for other disputed topics (Crimea, Kashmir, Western Sahara); the system prompt then supplies the specific positions for it to draw from.
Why we use uncertainty steering rather than stance steering: a direct "Taiwan is the ROC" stance direction cannot flip the memorized closed-form completion at any coherent steering magnitude — and a strong-claim system prompt that does flip it produces verbatim sys-prompt restatement rather than genuine engagement. Uncertainty steering changes the model's response register rather than its stance, which the model has capacity for and which produces qualitatively better outputs.
Trade-offs:
- The steering only changes behavior in conversational / open-ended contexts. Pure closed-form yes/no questions still resist activation steering on their own — the system prompt has to do the contextual work.
ffn=-3is the tested sweet spot on Q2_K cyberneurova-abliterated. Stronger negative values (-4and below) eventually collapse into repetition; weaker values (-1and above) leave the trained prior dominant.- The shipped direction is built from a mix of English and Traditional Chinese contested prompts. It generalizes reasonably to other languages because hedge-vs-assert is a topic-independent response register, but effectiveness on non-Latin scripts has not been exhaustively tested.
Set DS4_DIR_STEERING_FFN=0 to disable. Override DS4_DIR_STEERING_FILE
to use a different direction.
Same env vars as upstream, plus a couple of fork-specific ones:
DS4_SUPPORT_REPO— git URL of the ds4 fork to use. Defaulthttps://github.com/audreyt/ds4. Set tohttps://github.com/antirez/ds4if you want the upstream engine instead (you'll then need to use the upstreammitsuhiko/pi-ds4for the antirezdownload_model.shflow, or overrideDS4_DOWNLOAD_SCRIPT).DS4_SUPPORT_BRANCH— branch to clone. Defaultmain. Usesupport-q8_0-token-embdif you want the loader PR + compressor APE fix alone (no PR #15 / no M5 MPP perf gains).DS4_DOWNLOAD_SCRIPT— absolute path to the model-download script. Default is the bundleddownload_model.sh.DS4_MPP— Metal 4 MPP policy passed tods4-server --mpp. Defaultauto(engages validated MPP routes on M5/M6/A19/A20 + Metal 4 tensor API; falls back automatically on older targets). Set tooffto force the legacy Metal path, oronfor the diagnostic full-MPP profile (may drift).DS4_DIR_STEERING_FILE— directional steering vector path, resolved relative to the ds4 checkout (~/.pi/ds4/support/by default). Defaultdir-steering/out/uncertainty.f32. See Directional steering above.DS4_DIR_STEERING_FFN— FFN-output steering scale. Default-3. Set to0to disable steering entirely.DS4_DIR_STEERING_ATTN— attention-output steering scale. Default0.DS4_RUNTIME_DIR— use an existing ds4 checkout instead of~/.pi/ds4/supportDS4_MODEL_QUANT— onlyq2is currently supported (cyberneurova ships Q2_K only). Default is auto-detected from RAM (≥128 GB →q2).DS4_READY_TIMEOUT_MS— server startup timeout.DS4_SERVER_BINARY— customds4-serverbinary path.HF_TOKEN— passed through tocurlfor HuggingFace downloads if set.
- mitsuhiko/pi-ds4 — the upstream extension this fork is based on. All of the lifecycle / watchdog / lease machinery is Armin Ronacher's work.
- antirez/ds4 — Salvatore Sanfilippo's DeepSeek V4 Flash inference engine, hand-written in C in the same tradition as Redis. The llama.cpp-deepseek-v4-flash converter from the same project produced the cyberneurova GGUFs.
- ivanfioravanti's PR #15 — M5
Metal 4 / MPP optimization work that lives in
audreyt/ds4mainuntil it lands upstream. - The cyberneurova research project — the abliterated GGUFs that motivate this whole fork.
MIT, matching upstream. See LICENSE.
