Discovery runbook: GitHub topics + arXiv + vLLM issue + HF back-links + Papers with Code + 2 × DEV.to#56
Draft
FluffyAIcode wants to merge 1 commit intomainfrom
Draft
Conversation
…on kit Consolidates the GitHub-topics + arXiv + vLLM-issue + HF-back-links + Papers-with-Code + DEV.to launch plan into a single runnable runbook, with a separate ready-to-paste artifact per task. No source code changes — this is pure distribution content that PR #54 (GEO + credit) was missing. docs/announce/discovery_runbook.md Top-level runbook: 6 tasks, per-task owner / difficulty / expected payoff, run order (1 GitHub topics -> 2 arXiv -> 3 vLLM -> 4 HF -> 5 PwC -> 6 DEV.to), and a tracking table for marking each step done. Task 1 includes the exact 'gh repo edit --add-topic ...' one-liner with the 8 requested topics (kv-cache, kv-cache-compression, quantization, vllm, lattice-quantization, llm-inference, long-context, e8-lattice) plus an optional second tier (d4-lattice, transformers, huggingface, deepseek-v4, qwen3, flashattention, pytorch, arxiv). docs/announce/arxiv/SUBMISSION.md arXiv submission checklist: pre-flight LaTeX checks, bundle description (which files to upload and which to exclude), target categories (primary cs.LG, cross-list cs.CL + cs.IT — the cs.IT cross-list 'meaningfully widens the retrieval surface'), abstract pulled from the .tex, comments field that carries the GitHub + PyPI + HF Space URLs, MSC/ACM classification (94A29 + 68T07 / I.2.7 + E.4), license recommendation (CC BY 4.0 so Perplexity / ChatGPT can ingest), and a post-submission 'one-commit PR' spec for wiring the minted arXiv ID into README + CITATION.cff + ACKNOWLEDGMENTS + reports/paper/README. docs/announce/vllm_integration_issue.md Pre-written RFC body for vllm-project/vllm Discussions, in the NexusQuant #16047 format: proposal, what KakeyaLattice does, why another KV quantiser (iso-PPL table across 4 models vs TurboQuant), what already exists (kakeyalattice.hf.KakeyaLatticeCache + vllm_backend capture plugin), three integration paths (Path A KVCacheQuantConfig backend = default proposal; Path B fused decode Triton kernel; Path C compressed-cold-tier only), what I want from maintainers (Path choice + exact interface + naming scheme), compliance note. Includes timing guidance ('reply within 4 hours Pacific business hours' for first maintainer engagement) and pre-prepared responses to the two most-likely reviewer asks ('real HBM savings?' and 'vs KIVI?'). docs/announce/hf_space_backlinks.md Two sub-sections: (1) Space outbound links (current status, badge suggestions, arXiv-badge-once-minted recipe, HF Collection pin recipe), (2) model-card inbound link-backs for six model cards (Qwen3-0.6B, Qwen3-4B, Llama-3.2-1B, DeepSeek-R1-Distill-Qwen-1.5B, GLM-4-9B-Chat, Gemma-4-E4B) ranked by expected PR acceptance rate, with a per-model PR body template + a diff-ready 'Related projects' entry. docs/announce/papers_with_code/SUBMISSION.md Paper submission form fill-in (title, abstract, tasks, methods with New-Method description for 'Nested-Lattice Quantization' and 'Sylvester-Hadamard Rotation') + four benchmark rows on the kv-cache-compression task (one per model: Qwen3-4B 2.77x, GLM 2.44x, Gemma 3.04x, DeepSeek 2.43x at ≤2%) + a DeepSeek-V4-Flash row on the model-compression task (-22% bits vs FP8 at layer- weighted rel-MSE 0.959 ± 0.024). docs/announce/dev_to/post_1_theory.md DEV.to post 1, theory-first (~1200 words). DEV.to front-matter with canonical_url pointing back to the repo so SEO juice credits the GitHub source. Structure: TL;DR -> why scalar quantizers leave bits on the table (heavy-tail non-isotropic KV) -> Step 1 Sylvester-Hadamard rotation (1867 recursion, O(D log D), norm- preserving) -> Step 2 nested-lattice closest-point (D4 1.5 dB gain, E8 3.2 dB gain over scalar at same rate; Conway-Sloane decoders) -> iso-PPL table -> streaming latency -> 10-line integration -> what KakeyaLattice does NOT do -> Try it. Targets queries like 'nested lattice vs scalar quantisation', 'E8 lattice KV', 'Hadamard rotation for LLM activations'. docs/announce/dev_to/post_2_practice.md DEV.to post 2, practice-first (~1000 words). Targets a different audience: engineers who want to ship faster inference this week. Structure: TL;DR -> the setup -> pip install -> 10-line integration -> three operating points (q_range 10 aggressive / 38 balanced / 152 near-lossless, bits/vec + typical |Δppl|) -> per-model numbers -> streaming-safe by construction -> operational checklist -> when NOT to ship -> HF Space live demo -> links + cite. Targets queries like 'transformers DynamicCache compression', 'compress Qwen3 KV cache', 'KakeyaLattice tutorial'. No numerical claim in any of these artifacts goes beyond what is already in reports/v1_4_release/kv_128k_isoppl_n8/, reports/v1_5_release/dsv4_stage075/FINDINGS_N8.md, or reports/v1_4_release/streaming/. Cross-file numeric consistency checked against benchmarks/extract_iso_ppl_table.py output. Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Six-task discovery runbook + one ready-to-execute artifact per task.
Each artifact is self-contained, targets a specific search-engine /
AI-answer-engine surface, and mirrors the NexusQuant launch pattern
that currently dominates "E8 KV compression" query results.
PR content is entirely documentation + copy-paste kits. Nothing
runs code. No benchmark numbers are introduced that are not already
backed by JSON under
reports/.Files added
Task → owner → status
discovery_runbook.md§1gh repo edit ... --add-topic ...line awayarxiv/SUBMISSION.mdarxiv.org/submitvllm_integration_issue.mdvllm-project/vllmDiscussionshf_space_backlinks.mdpapers_with_code/SUBMISSION.mddev_to/post_1_theory.md+post_2_practice.mddev.to/new, front-matter is correctWhy the order 1 → 6 matters
Cheap win, makes the repo show up on
github.com/topics/e8-latticeetc.ML and unlocks Task 5. Expected processing: 1 business day.
ID to reference in the opening post for maximum credibility.
badge refresh.
discussion; posting them before those exist means editing later.
All artifacts include clear timing guidance, pre-prepared fallback
content for the most likely reviewer asks, and explicit "done when"
criteria.
GEO rationale (why this PR complements PR #54)
PR #54 set up the local GEO surfaces (README, FAQ, blog,
launch kit, CITATION.cff, ACKNOWLEDGMENTS.md, DEPLOYMENTS.md).
This PR sets up the external GEO surfaces (GitHub topic pages,
arXiv, vLLM, HF, PwC, DEV.to) that push traffic toward the local
surfaces. Credit-wise, cross-source consistency of naming and
citation form is preserved: all external artifacts name the same
peer methods (TurboQuant, KIVI, SmoothQuant, HQQ, Quanto, SnapKV,
H2O, Scissorhands) with the same arXiv / DOI anchors as the README
and
ACKNOWLEDGMENTS.md.What's not in this PR
are owner-operated by design).
templates (those are PR GEO + Credit: README hero + FAQ + landscape-survey blog + CITATION.cff + ACKNOWLEDGMENTS.md + DEPLOYMENTS.md + launch kit #54's scope).
to align on integration path before coding.
Follow-up (after this PR merges)
Once the arXiv ID lands, a small follow-up PR will:
DOI — pendingbadge in README with an arXiv badge.CITATION.cff(identifiers:list) andACKNOWLEDGMENTS.md.SPACE_README.mdto the HF Space viahuggingface_hub.HfApi.create_commitso the Space landing pagecarries the arXiv badge.
v1.5.0-arxivto minta DOI for the exact commit the arXiv abstract references.
That follow-up is ~5 minutes of automated work. Leave a message on
this PR (or start a new cloud-agent session) with the arXiv ID once
it mints and I'll wire everything up.