Skip to content

Discovery runbook: GitHub topics + arXiv + vLLM issue + HF back-links + Papers with Code + 2 × DEV.to#56

Draft
FluffyAIcode wants to merge 1 commit intomainfrom
AgentMemory/discovery-runbook-c478
Draft

Discovery runbook: GitHub topics + arXiv + vLLM issue + HF back-links + Papers with Code + 2 × DEV.to#56
FluffyAIcode wants to merge 1 commit intomainfrom
AgentMemory/discovery-runbook-c478

Conversation

@FluffyAIcode
Copy link
Copy Markdown
Owner

Summary

Six-task discovery runbook + one ready-to-execute artifact per task.
Each artifact is self-contained, targets a specific search-engine /
AI-answer-engine surface, and mirrors the NexusQuant launch pattern
that currently dominates "E8 KV compression" query results.

PR content is entirely documentation + copy-paste kits. Nothing
runs code. No benchmark numbers are introduced that are not already
backed by JSON under reports/.

Files added

docs/announce/
  discovery_runbook.md                    — top-level 6-task runbook
  arxiv/
    SUBMISSION.md                         — full arXiv submission checklist
  dev_to/
    post_1_theory.md                      — DEV.to post 1 (~1200 words, theory)
    post_2_practice.md                    — DEV.to post 2 (~1000 words, practice)
  hf_space_backlinks.md                   — Space + model-card back-link PR templates
  papers_with_code/
    SUBMISSION.md                         — PwC paper + 5 benchmark rows
  vllm_integration_issue.md               — pre-written vLLM RFC body (NexusQuant-style)

Task → owner → status

# task owner artifact status
1 GitHub topics (8 terms) you — repo admin discovery_runbook.md §1 ready — one gh repo edit ... --add-topic ... line away
2 arXiv submission you — paper author arxiv/SUBMISSION.md ready — paste metadata into arxiv.org/submit
3 vLLM RFC you — GitHub account vllm_integration_issue.md ready — paste body + title into vllm-project/vllm Discussions
4 HF Space + model-card back-links you — HF token holder hf_space_backlinks.md Space side done (2026-04-25); 6 model-card PR drafts ready
5 Papers with Code you — paper author papers_with_code/SUBMISSION.md blocked on arXiv ID; form fill-in ready
6 DEV.to × 2 me (wrote) + you (publish) dev_to/post_1_theory.md + post_2_practice.md ready — paste into dev.to/new, front-matter is correct

Why the order 1 → 6 matters

  1. GitHub topics first — GitHub topic pages index within hours.
    Cheap win, makes the repo show up on
    github.com/topics/e8-lattice etc.
  2. arXiv second — arXiv ID is the highest-authority anchor in
    ML and unlocks Task 5. Expected processing: 1 business day.
  3. vLLM RFC third — discussion-thread format, needs the arXiv
    ID to reference in the opening post for maximum credibility.
  4. HF back-links fourth — depends on arXiv ID for the Space
    badge refresh.
  5. Papers with Code fifthrequires the arXiv ID.
  6. DEV.to sixth — posts should link forward to arXiv + PwC + vLLM
    discussion; posting them before those exist means editing later.

All artifacts include clear timing guidance, pre-prepared fallback
content for the most likely reviewer asks, and explicit "done when"
criteria.

GEO rationale (why this PR complements PR #54)

PR #54 set up the local GEO surfaces (README, FAQ, blog,
launch kit, CITATION.cff, ACKNOWLEDGMENTS.md, DEPLOYMENTS.md).
This PR sets up the external GEO surfaces (GitHub topic pages,
arXiv, vLLM, HF, PwC, DEV.to) that push traffic toward the local
surfaces. Credit-wise, cross-source consistency of naming and
citation form is preserved: all external artifacts name the same
peer methods (TurboQuant, KIVI, SmoothQuant, HQQ, Quanto, SnapKV,
H2O, Scissorhands) with the same arXiv / DOI anchors as the README
and ACKNOWLEDGMENTS.md.

What's not in this PR

Follow-up (after this PR merges)

Once the arXiv ID lands, a small follow-up PR will:

  • Replace the DOI — pending badge in README with an arXiv badge.
  • Wire the arXiv ID into CITATION.cff (identifiers: list) and
    ACKNOWLEDGMENTS.md.
  • Push an updated SPACE_README.md to the HF Space via
    huggingface_hub.HfApi.create_commit so the Space landing page
    carries the arXiv badge.
  • Enable Zenodo's GitHub integration and tag v1.5.0-arxiv to mint
    a DOI for the exact commit the arXiv abstract references.

That follow-up is ~5 minutes of automated work. Leave a message on
this PR (or start a new cloud-agent session) with the arXiv ID once
it mints and I'll wire everything up.

Open in Web Open in Cursor 

…on kit

Consolidates the GitHub-topics + arXiv + vLLM-issue + HF-back-links +
Papers-with-Code + DEV.to launch plan into a single runnable runbook,
with a separate ready-to-paste artifact per task. No source code
changes — this is pure distribution content that PR #54 (GEO + credit)
was missing.

docs/announce/discovery_runbook.md
  Top-level runbook: 6 tasks, per-task owner / difficulty / expected
  payoff, run order (1 GitHub topics -> 2 arXiv -> 3 vLLM -> 4 HF ->
  5 PwC -> 6 DEV.to), and a tracking table for marking each step done.
  Task 1 includes the exact 'gh repo edit --add-topic ...' one-liner
  with the 8 requested topics (kv-cache, kv-cache-compression,
  quantization, vllm, lattice-quantization, llm-inference, long-context,
  e8-lattice) plus an optional second tier (d4-lattice, transformers,
  huggingface, deepseek-v4, qwen3, flashattention, pytorch, arxiv).

docs/announce/arxiv/SUBMISSION.md
  arXiv submission checklist: pre-flight LaTeX checks, bundle
  description (which files to upload and which to exclude), target
  categories (primary cs.LG, cross-list cs.CL + cs.IT — the cs.IT
  cross-list 'meaningfully widens the retrieval surface'), abstract
  pulled from the .tex, comments field that carries the GitHub +
  PyPI + HF Space URLs, MSC/ACM classification (94A29 + 68T07 / I.2.7
  + E.4), license recommendation (CC BY 4.0 so Perplexity / ChatGPT
  can ingest), and a post-submission 'one-commit PR' spec for wiring
  the minted arXiv ID into README + CITATION.cff + ACKNOWLEDGMENTS +
  reports/paper/README.

docs/announce/vllm_integration_issue.md
  Pre-written RFC body for vllm-project/vllm Discussions, in the
  NexusQuant #16047 format: proposal, what KakeyaLattice does, why
  another KV quantiser (iso-PPL table across 4 models vs TurboQuant),
  what already exists (kakeyalattice.hf.KakeyaLatticeCache +
  vllm_backend capture plugin), three integration paths (Path A
  KVCacheQuantConfig backend = default proposal; Path B fused decode
  Triton kernel; Path C compressed-cold-tier only), what I want from
  maintainers (Path choice + exact interface + naming scheme),
  compliance note. Includes timing guidance ('reply within 4 hours
  Pacific business hours' for first maintainer engagement) and
  pre-prepared responses to the two most-likely reviewer asks
  ('real HBM savings?' and 'vs KIVI?').

docs/announce/hf_space_backlinks.md
  Two sub-sections: (1) Space outbound links (current status, badge
  suggestions, arXiv-badge-once-minted recipe, HF Collection pin
  recipe), (2) model-card inbound link-backs for six model cards
  (Qwen3-0.6B, Qwen3-4B, Llama-3.2-1B, DeepSeek-R1-Distill-Qwen-1.5B,
  GLM-4-9B-Chat, Gemma-4-E4B) ranked by expected PR acceptance rate,
  with a per-model PR body template + a diff-ready 'Related projects'
  entry.

docs/announce/papers_with_code/SUBMISSION.md
  Paper submission form fill-in (title, abstract, tasks, methods with
  New-Method description for 'Nested-Lattice Quantization' and
  'Sylvester-Hadamard Rotation') + four benchmark rows on the
  kv-cache-compression task (one per model: Qwen3-4B 2.77x, GLM
  2.44x, Gemma 3.04x, DeepSeek 2.43x at ≤2%) + a DeepSeek-V4-Flash
  row on the model-compression task (-22% bits vs FP8 at layer-
  weighted rel-MSE 0.959 ± 0.024).

docs/announce/dev_to/post_1_theory.md
  DEV.to post 1, theory-first (~1200 words). DEV.to front-matter
  with canonical_url pointing back to the repo so SEO juice credits
  the GitHub source. Structure: TL;DR -> why scalar quantizers
  leave bits on the table (heavy-tail non-isotropic KV) -> Step 1
  Sylvester-Hadamard rotation (1867 recursion, O(D log D), norm-
  preserving) -> Step 2 nested-lattice closest-point (D4 1.5 dB gain,
  E8 3.2 dB gain over scalar at same rate; Conway-Sloane decoders)
  -> iso-PPL table -> streaming latency -> 10-line integration ->
  what KakeyaLattice does NOT do -> Try it. Targets queries like
  'nested lattice vs scalar quantisation', 'E8 lattice KV', 'Hadamard
  rotation for LLM activations'.

docs/announce/dev_to/post_2_practice.md
  DEV.to post 2, practice-first (~1000 words). Targets a different
  audience: engineers who want to ship faster inference this week.
  Structure: TL;DR -> the setup -> pip install -> 10-line integration
  -> three operating points (q_range 10 aggressive / 38 balanced /
  152 near-lossless, bits/vec + typical |Δppl|) -> per-model
  numbers -> streaming-safe by construction -> operational checklist
  -> when NOT to ship -> HF Space live demo -> links + cite. Targets
  queries like 'transformers DynamicCache compression', 'compress
  Qwen3 KV cache', 'KakeyaLattice tutorial'.

No numerical claim in any of these artifacts goes beyond what is
already in reports/v1_4_release/kv_128k_isoppl_n8/,
reports/v1_5_release/dsv4_stage075/FINDINGS_N8.md, or
reports/v1_4_release/streaming/. Cross-file numeric consistency
checked against benchmarks/extract_iso_ppl_table.py output.

Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>
@cursor cursor Bot deleted the AgentMemory/discovery-runbook-c478 branch April 27, 2026 07:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants