Discovery runbook: GitHub topics + arXiv + vLLM issue + HF back-links + Papers with Code + 2 × DEV.to by FluffyAIcode · Pull Request #56 · FluffyAIcode/LLM-KV--Cache-compress

FluffyAIcode · 2026-04-27T07:24:17Z

Summary

Six-task discovery runbook + one ready-to-execute artifact per task.
Each artifact is self-contained, targets a specific search-engine /
AI-answer-engine surface, and mirrors the NexusQuant launch pattern
that currently dominates "E8 KV compression" query results.

PR content is entirely documentation + copy-paste kits. Nothing
runs code. No benchmark numbers are introduced that are not already
backed by JSON under reports/.

Files added

docs/announce/
  discovery_runbook.md                    — top-level 6-task runbook
  arxiv/
    SUBMISSION.md                         — full arXiv submission checklist
  dev_to/
    post_1_theory.md                      — DEV.to post 1 (~1200 words, theory)
    post_2_practice.md                    — DEV.to post 2 (~1000 words, practice)
  hf_space_backlinks.md                   — Space + model-card back-link PR templates
  papers_with_code/
    SUBMISSION.md                         — PwC paper + 5 benchmark rows
  vllm_integration_issue.md               — pre-written vLLM RFC body (NexusQuant-style)

Task → owner → status

#	task	owner	artifact	status
1	GitHub topics (8 terms)	you — repo admin	`discovery_runbook.md` §1	ready — one `gh repo edit ... --add-topic ...` line away
2	arXiv submission	you — paper author	`arxiv/SUBMISSION.md`	ready — paste metadata into `arxiv.org/submit`
3	vLLM RFC	you — GitHub account	`vllm_integration_issue.md`	ready — paste body + title into `vllm-project/vllm` Discussions
4	HF Space + model-card back-links	you — HF token holder	`hf_space_backlinks.md`	Space side done (2026-04-25); 6 model-card PR drafts ready
5	Papers with Code	you — paper author	`papers_with_code/SUBMISSION.md`	blocked on arXiv ID; form fill-in ready
6	DEV.to × 2	me (wrote) + you (publish)	`dev_to/post_1_theory.md` + `post_2_practice.md`	ready — paste into `dev.to/new`, front-matter is correct

Why the order 1 → 6 matters

GitHub topics first — GitHub topic pages index within hours.
Cheap win, makes the repo show up on
github.com/topics/e8-lattice etc.
arXiv second — arXiv ID is the highest-authority anchor in
ML and unlocks Task 5. Expected processing: 1 business day.
vLLM RFC third — discussion-thread format, needs the arXiv
ID to reference in the opening post for maximum credibility.
HF back-links fourth — depends on arXiv ID for the Space
badge refresh.
Papers with Code fifth — requires the arXiv ID.
DEV.to sixth — posts should link forward to arXiv + PwC + vLLM
discussion; posting them before those exist means editing later.

All artifacts include clear timing guidance, pre-prepared fallback
content for the most likely reviewer asks, and explicit "done when"
criteria.

GEO rationale (why this PR complements PR #54)

PR #54 set up the local GEO surfaces (README, FAQ, blog,
launch kit, CITATION.cff, ACKNOWLEDGMENTS.md, DEPLOYMENTS.md).
This PR sets up the external GEO surfaces (GitHub topic pages,
arXiv, vLLM, HF, PwC, DEV.to) that push traffic toward the local
surfaces. Credit-wise, cross-source consistency of naming and
citation form is preserved: all external artifacts name the same
peer methods (TurboQuant, KIVI, SmoothQuant, HQQ, Quanto, SnapKV,
H2O, Scissorhands) with the same arXiv / DOI anchors as the README
and ACKNOWLEDGMENTS.md.

What's not in this PR

No actual submissions (I cannot file on your behalf — all 6 tasks
are owner-operated by design).
No changes to the existing README / FAQ / blog / announce
templates (those are PR GEO + Credit: README hero + FAQ + landscape-survey blog + CITATION.cff + ACKNOWLEDGMENTS.md + DEPLOYMENTS.md + launch kit #54's scope).
No new benchmark numbers or re-runs.
No vLLM code — the RFC is explicitly a discussion-first filing
to align on integration path before coding.

Follow-up (after this PR merges)

Once the arXiv ID lands, a small follow-up PR will:

Replace the DOI — pending badge in README with an arXiv badge.
Wire the arXiv ID into CITATION.cff (identifiers: list) and
ACKNOWLEDGMENTS.md.
Push an updated SPACE_README.md to the HF Space via
huggingface_hub.HfApi.create_commit so the Space landing page
carries the arXiv badge.
Enable Zenodo's GitHub integration and tag v1.5.0-arxiv to mint
a DOI for the exact commit the arXiv abstract references.

That follow-up is ~5 minutes of automated work. Leave a message on
this PR (or start a new cloud-agent session) with the arXiv ID once
it mints and I'll wire everything up.

…on kit Consolidates the GitHub-topics + arXiv + vLLM-issue + HF-back-links + Papers-with-Code + DEV.to launch plan into a single runnable runbook, with a separate ready-to-paste artifact per task. No source code changes — this is pure distribution content that PR #54 (GEO + credit) was missing. docs/announce/discovery_runbook.md Top-level runbook: 6 tasks, per-task owner / difficulty / expected payoff, run order (1 GitHub topics -> 2 arXiv -> 3 vLLM -> 4 HF -> 5 PwC -> 6 DEV.to), and a tracking table for marking each step done. Task 1 includes the exact 'gh repo edit --add-topic ...' one-liner with the 8 requested topics (kv-cache, kv-cache-compression, quantization, vllm, lattice-quantization, llm-inference, long-context, e8-lattice) plus an optional second tier (d4-lattice, transformers, huggingface, deepseek-v4, qwen3, flashattention, pytorch, arxiv). docs/announce/arxiv/SUBMISSION.md arXiv submission checklist: pre-flight LaTeX checks, bundle description (which files to upload and which to exclude), target categories (primary cs.LG, cross-list cs.CL + cs.IT — the cs.IT cross-list 'meaningfully widens the retrieval surface'), abstract pulled from the .tex, comments field that carries the GitHub + PyPI + HF Space URLs, MSC/ACM classification (94A29 + 68T07 / I.2.7 + E.4), license recommendation (CC BY 4.0 so Perplexity / ChatGPT can ingest), and a post-submission 'one-commit PR' spec for wiring the minted arXiv ID into README + CITATION.cff + ACKNOWLEDGMENTS + reports/paper/README. docs/announce/vllm_integration_issue.md Pre-written RFC body for vllm-project/vllm Discussions, in the NexusQuant #16047 format: proposal, what KakeyaLattice does, why another KV quantiser (iso-PPL table across 4 models vs TurboQuant), what already exists (kakeyalattice.hf.KakeyaLatticeCache + vllm_backend capture plugin), three integration paths (Path A KVCacheQuantConfig backend = default proposal; Path B fused decode Triton kernel; Path C compressed-cold-tier only), what I want from maintainers (Path choice + exact interface + naming scheme), compliance note. Includes timing guidance ('reply within 4 hours Pacific business hours' for first maintainer engagement) and pre-prepared responses to the two most-likely reviewer asks ('real HBM savings?' and 'vs KIVI?'). docs/announce/hf_space_backlinks.md Two sub-sections: (1) Space outbound links (current status, badge suggestions, arXiv-badge-once-minted recipe, HF Collection pin recipe), (2) model-card inbound link-backs for six model cards (Qwen3-0.6B, Qwen3-4B, Llama-3.2-1B, DeepSeek-R1-Distill-Qwen-1.5B, GLM-4-9B-Chat, Gemma-4-E4B) ranked by expected PR acceptance rate, with a per-model PR body template + a diff-ready 'Related projects' entry. docs/announce/papers_with_code/SUBMISSION.md Paper submission form fill-in (title, abstract, tasks, methods with New-Method description for 'Nested-Lattice Quantization' and 'Sylvester-Hadamard Rotation') + four benchmark rows on the kv-cache-compression task (one per model: Qwen3-4B 2.77x, GLM 2.44x, Gemma 3.04x, DeepSeek 2.43x at ≤2%) + a DeepSeek-V4-Flash row on the model-compression task (-22% bits vs FP8 at layer- weighted rel-MSE 0.959 ± 0.024). docs/announce/dev_to/post_1_theory.md DEV.to post 1, theory-first (~1200 words). DEV.to front-matter with canonical_url pointing back to the repo so SEO juice credits the GitHub source. Structure: TL;DR -> why scalar quantizers leave bits on the table (heavy-tail non-isotropic KV) -> Step 1 Sylvester-Hadamard rotation (1867 recursion, O(D log D), norm- preserving) -> Step 2 nested-lattice closest-point (D4 1.5 dB gain, E8 3.2 dB gain over scalar at same rate; Conway-Sloane decoders) -> iso-PPL table -> streaming latency -> 10-line integration -> what KakeyaLattice does NOT do -> Try it. Targets queries like 'nested lattice vs scalar quantisation', 'E8 lattice KV', 'Hadamard rotation for LLM activations'. docs/announce/dev_to/post_2_practice.md DEV.to post 2, practice-first (~1000 words). Targets a different audience: engineers who want to ship faster inference this week. Structure: TL;DR -> the setup -> pip install -> 10-line integration -> three operating points (q_range 10 aggressive / 38 balanced / 152 near-lossless, bits/vec + typical |Δppl|) -> per-model numbers -> streaming-safe by construction -> operational checklist -> when NOT to ship -> HF Space live demo -> links + cite. Targets queries like 'transformers DynamicCache compression', 'compress Qwen3 KV cache', 'KakeyaLattice tutorial'. No numerical claim in any of these artifacts goes beyond what is already in reports/v1_4_release/kv_128k_isoppl_n8/, reports/v1_5_release/dsv4_stage075/FINDINGS_N8.md, or reports/v1_4_release/streaming/. Cross-file numeric consistency checked against benchmarks/extract_iso_ppl_table.py output. Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>

cursor Bot deleted the AgentMemory/discovery-runbook-c478 branch April 27, 2026 07:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discovery runbook: GitHub topics + arXiv + vLLM issue + HF back-links + Papers with Code + 2 × DEV.to#56

Discovery runbook: GitHub topics + arXiv + vLLM issue + HF back-links + Papers with Code + 2 × DEV.to#56
FluffyAIcode wants to merge 1 commit intomainfrom
AgentMemory/discovery-runbook-c478

FluffyAIcode commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

FluffyAIcode commented Apr 27, 2026

Summary

Files added

Task → owner → status

Why the order 1 → 6 matters

GEO rationale (why this PR complements PR #54)

What's not in this PR

Follow-up (after this PR merges)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants