Skip to content

air-gapped/skills

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

54 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

skills

Claude Code plugin marketplace — 20+ installable reference skills for vLLM, Kubernetes, release engineering, and skill authoring.

Install

/plugin marketplace add air-gapped/skills
/plugin install <plugin>@air-gapped-marketplace

Plugins are either single-skill (e.g. jinja-expert, helm, keda) or grouped suites (e.g. vllm — bundles all 14 vLLM reference skills into one plugin). See .claude-plugin/marketplace.json for the full list.

Versioning scheme per plugin: 0.YYYYMMDD.N where YYYYMMDD is the UTC date of the most recent content change across member skills and N is the unique commit count touching any member skill directory. Run /plugin update to pick up new bumps.

Skill Description
aiperf NVIDIA AIPerf — vendor-neutral generative-AI inference benchmarking (genai-perf successor). Covers aiperf profile with concurrency / request-rate / fixed-schedule trace replay / user-centric / multi-run confidence, 15 endpoint types (chat,…
autoresearch Karpathy-pattern autoresearch — autonomous hill-climbing over a measurable metric, deep multi-agent research, or research-then-optimize. Three modes: Optimize (keep/discard ratchet), Research (STORM multi-perspective), Improve.
baml-expert BAML (Boundary ML) expert for projects defining LLM calls as typed functions in .baml files with a generated Python client. Use whenever the repo contains baml_src/, baml_client/, baml-cli commands, or imports from baml_py / baml_client. Covers…
helm Author and maintain Helm charts: create chart, write templates, values.yaml patterns, _helpers.tpl, Chart.yaml, values.schema.json, helm-docs, library charts. Helm 4 (SSA, WASM, OCI digest). Chart CI/CD (lint, helm-unittest, chart-testing,…
jinja-expert Author, read, and debug Jinja2 templates across the three places Jinja lives in 2026 — HuggingFace chat_template.jinja (rendered by apply_chat_template for vLLM / sglang), Ansible playbooks + .j2 files, and Jinja-adjacent Kubernetes workflows…
keda Configure, operate, and master KEDA (Kubernetes Event-driven Autoscaling) — ScaledObject, ScaledJob, TriggerAuthentication CRDs, 70+ scalers, HPA behavior tuning, scale-to-zero, the KEDA HTTP Add-on, production hardening, multi-trigger semantics,…
makefile-best-practices Makefile best practices, patterns, and templates for GNU Make 4.x — dependency graphs, task-runner workflows, parallel-safe recipes, self-documenting help targets, and language-specific patterns (Go, Python, Node, Docker, Helm, POSIX).
nvidia-nixl NVIDIA Inference Xfer Library (NIXL) operator + developer reference. Point-to-point KV-cache and tensor transport for distributed inference (Dynamo, vLLM, SGLang). Covers the C++/Python/Rust agent API, all 13 backend plugins (UCX, GDS, GDS_MT,…
openshift-app Package applications for OpenShift deployment: container images (UBI, arbitrary UID, multi-stage builds), packaging formats (Helm, Kustomize, Operators, OLM v1), CI/CD (Tekton, ArgoCD, Shipwright, Conforma), security (SCC, PSA, supply chain, image…
prometheus-mimir-grafana Query Prometheus and Grafana Mimir, write and debug PromQL, and build or fix Grafana dashboards — for agents solving problems from metrics. Covers the Prometheus HTTP API (/api/v1/query, query_range, series, labels, metadata), Mimir…
skill-improver Autoresearch loop for Claude Code skills — greedy keep/discard hill climbing on a 10-dimension quality rubric, with blind subagent validation for self-scoring bias, plus a freshen mode that probes external references (release notes, docs,…
transformers-config-tokenizers-expert Preflight reference for HuggingFace snapshots — what vLLM, sglang, and transformers.generate see at runtime. Covers config-file precedence (tokenizer.json, tokenizer_config.json, generation_config.json, chat_template.jinja), transformers v5…
vllm-benchmarking Run production vLLM benchmarks — vllm bench (serve, throughput, latency, sweep, startup, mm-processor), request-rate vs max-concurrency semantics, TTFT/TPOT/ITL/E2EL percentiles, goodput SLO measurement, prefix-cache workloads, air-gapped…
vllm-caching vLLM tiered KV cache configuration for production H100/H200 clusters. Native CPU offload, LMCache (CPU+NVMe+GDS), NixlConnector (disaggregated prefill), MooncakeConnector (RDMA), MultiConnector composition. Version gates, sizing math (flag total…
vllm-chat-templates vLLM chat-template (prompt-side Jinja) operator reference. Template resolution precedence (--chat-template → AutoProcessor → tokenizer default → bundled fallback), chat_template_kwargs allowlist silently dropping…
vllm-configuration Configure vLLM completely — YAML config file format, CLI arg precedence, full VLLM_/HF_/TRANSFORMERS_* env-var catalog, end-to-end recipe for air-gapped environments (internal HF mirrors, hf-mirror.com, ModelScope, HF_HUB_OFFLINE with pre-seeded…
vllm-deployment Deploy production vLLM on Kubernetes, OpenShift, Docker/Podman. Pod shape (load-bearing /dev/shm, cold-load liveness 600s), multi-node LWS + Ray, control plane (llm-d, production-stack, AIBrix, NVIDIA Dynamo, KServe), Gateway API Inference…
vllm-input-modalities vLLM non-chat inference surfaces — text embeddings (/v1/embeddings, /v2/embed), reranking/scoring (/rerank, /score), speech-to-text (/v1/audio/transcriptions, /v1/audio/translations), document OCR via VLMs. Covers 2026 --runner pooling
vllm-nvidia-hardware NVIDIA AI-hardware + vLLM-platform reference covering Hopper (H100/H200), Blackwell (B100/B200/B300) and Blackwell Ultra, Grace-Blackwell superchips and NVL72 racks (GB200, GB300), Vera Rubin (R100/R300) with VR200 NVL144 and Kyber NVL576, Dell…
vllm-observability Observe production vLLM — /metrics Prometheus surface (V1 engine), SLO-driven alerting on TTFT/ITL/queue/KV/preemption/aborts/corrupted-logits, shipping Grafana dashboards in examples/observability/, OTLP tracing with --otlp-traces-endpoint
vllm-omni vLLM-Omni output-side multimodal generation — image (FLUX.1/2, Qwen-Image, GLM-Image, BAGEL, SD3.5, HunyuanImage-3.0), video (Wan2.1/2.2, LTX-2, HunyuanVideo-1.5), TTS (Qwen3-TTS, CosyVoice3, Voxtral-TTS), any-to-any omni (Qwen3-Omni, Qwen2.5-Omni,…
vllm-performance-tuning vLLM performance-tuning operator reference — tuning workflow (baseline → bottleneck → knob → re-bench), fused-MoE kernel autotune (benchmark_moe.py generates E=N,N=M,device_name=X.json configs), DeepEP all-to-all + expert parallelism + EPLB,…
vllm-quantization vLLM datacenter-GPU quantization — picking, configuring, troubleshooting NVFP4, FP8, MXFP4, MXFP8, AWQ, GPTQ, INT8, compressed-tensors, modelopt, quark on H100/H200/B200/B300/GB200/GB300. 29 --quantization flag values, KV-cache dtypes (fp8_e4m3,…
vllm-reasoning-parsers vLLM reasoning-parser operator + developer reference. --reasoning-parser CLI wiring, ReasoningParser contract (non-streaming extract_reasoning + per-delta extract_reasoning_streaming), is_reasoning_end xgrammar gating,…
vllm-speculative-decoding Pick, configure, tune, monitor vLLM speculative decoding in production. Eleven SpeculativeMethod options (ngram, ngram_gpu, medusa, mlp_speculator, draft_model, suffix, eagle, eagle3, dflash, mtp, extract_hidden_states), --speculative-config JSON…
vllm-tool-parsers vLLM tool-calling operator reference — picking --tool-call-parser per model family, writing custom parsers via --tool-parser-plugin, navigating vLLM source + GitHub tracker to debug any specific tool-call question. Pointer map, not source…

MIT licensed.

Releases

No releases published

Packages

 
 
 

Contributors