Skip to content

History

Revisions

  • wiki: add lint script + example CI workflow scripts/lint.sh runs two checks against the wiki: 1. Every `qsv <command>` reference in a code context (fenced block or inline backtick) is a real command per `qsv --list`. The script merges `qsv --list` with a known feature-gated set (clipboard, prompt, py, lens, color, etc.) so a lite-build qsv still validates the full wiki. 2. When QSV_REPO_PATH points at a checkout of dathere/qsv, every `/docs/help/<X>.md` reference resolves to a real file in that checkout's docs/help/ directory. Exit codes: 0=clean, 1=stale references, 2=setup error. Allowlist files under scripts/lint.allowlist (per-token) and a built-in HELP_ALLOWLIST for template placeholders (cmd1, cmd2, X, etc.). scripts/wiki-lint-workflow.yml is a ready-to-copy GitHub Actions workflow for the MAIN qsv repo. GitHub Actions doesn't run on *.wiki repos, so the workflow file must live there. It runs nightly + on PRs that touch src/cmd/, src/main.rs, or docs/help/, and opens a sticky `wiki-lint` issue on scheduled failure. Also fixes the only real bug the lint caught: Command-Reference linked to a non-existent /docs/help/help.md. Reworded the row to describe `qsv help` without a broken link. Contributing-to-the-Wiki gets a new "Linting" section documenting both the local run and the CI setup. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

    @jqnatividad jqnatividad committed May 13, 2026
    8b4b62c
  • wiki: remove the removed `generate` command + correct command count The `generate` command was removed from qsv but still appeared in Command-Reference under "Generators & Utility". Drop the row and rename the section to "Utility" (now containing just `help`). Also corrects the stale "81 commands" claim used across several pages. README's command catalog currently has 71 entries; the standard `qsv` binary reports 68 from `qsv --list`. Replaced fixed counts with "70+" or "every command" / "every qsv command" where a specific number was not load-bearing, plus a Command-Reference note that the exact set depends on the binary variant and feature flags. Pages touched: Command-Reference, Home, Getting-Started, Why-qsv, Selection-and- Inspection, Comparison. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

    @jqnatividad jqnatividad committed May 13, 2026
    14666e3
  • wiki: Phase E complete - all 6 polish pages Troubleshooting: organized by category - install/launch (SIGILL, macOS Gatekeeper, Windows PATH, musl Luau), encoding/format (UTF-8, wrong delimiter, ragged rows, BOM), index/cache (stale, missing JSONL, Polars schema drift), memory (OOM, ext-* alternatives, approx algorithms), specific commands (stats date-whitelist, search case sensitivity, fetch rate limits, excel edge cases). Plus diagnostics with --version, --envlist, QSV_LOG_LEVEL, and where to report bugs. FAQ: short answers to the most-asked questions. qsv vs pandas/DuckDB/csvkit/ miller/xsv, Windows/ARM/IBM platform support, streaming vs in-memory, what an index/stats-cache/automagical command is, how to update, what features my qsv has, AI/LLM privacy (no data leaves your machine unless you explicitly send it), bug-reporting flow. Comparison: head-to-head with xsv (qsvlite is the drop-in), csvkit (10x faster), Miller (overlap, both excellent), DuckDB (complement, not replace - to-parquet hand-off), pandas (qsv as the preprocessing layer), Visidata (qsv lens is the closest equivalent), awk/sed (qsv understands CSV quoting). Ends with a "which tool when" cheat sheet. Glossary: 30+ terms - antimode, asof join, automagical, BLAKE3, cardinality, CKAN, DataSketches, dyncols, lazy frame, lookup table, MAD, MCP, MiniJinja, neuro-symbolic, non-equi join, Polars, Polars schema, Pragmastat, pseudonymization, reservoir sampling, RFC 4180, sortiness, stats cache, streaming, TOON, t-digest, WKT, XSD. External-Resources: try-it-online, 100.dathere.com lessons, 5 conference talks, "Have we achieved ACI?" 3-part blog series, canonical /docs/ links, live benchmark dashboard, Discussions categories, datHere ecosystem (qsv pro, qsv-recipes, qsv-lookup-tables, qsv-cookiecutter, DataPusher+), 4 contributed notebooks, packaging across 10 distros, underlying library credits, Zenodo DOI + DeepWiki + FOSSA, sponsor link. Contributing-to-the-Wiki: clone+commit+push workflow, page conventions (H1, tier badge, See also), templates for command-reference and recipe pages, when-to-update matrix, 10-item verification checklist, commit- message convention, don't-duplicate-/docs/ guidance, anchor dataset conventions, slug rules, asset handling, asciinema casts. All 49 wiki pages now have substantive content. Plan complete. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

    @jqnatividad jqnatividad committed May 13, 2026
    1535db1
  • wiki: Phase D complete - qsv-pro-Spotlight + Integrations qsv-pro-Spotlight: feature comparison vs CLI (drag-and-drop, charts, AI assistant, Workflow recording), download badges per OS, `qsv pro lens` / `qsv pro workflow` bridge commands, when-to-pick-qsv-pro vs CLI guidance. Integrations: maps the full ecosystem surface. CKAN/DataPusher+ (safenames + applydp + to postgres + dynamicEnum with ckan://), DuckDB (to parquet hand-off, scoresql --duckdb), Python notebooks (4 example notebooks in contrib/notebooks/), CI/CD GitHub Actions (validate + diff gates), qsv-recipes + qsv-lookup-tables + qsv-cookiecutter community repos, Claude/LLM integrations (MCP server, Cowork plugin, describegpt), spreadsheet pipelines (excel/to xlsx/to ods), geospatial tooling (QGIS, GeoPandas, PostGIS, kepler.gl), shell composition patterns. All 8 Phase D pages now live. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

    @jqnatividad jqnatividad committed May 13, 2026
    f5e17f5
  • wiki: Phase D batch 3 - Claude-Cowork-Plugin + MCP-Server Claude-Cowork-Plugin: complete walkthrough of the 15-skill / 3-agent plugin. User-invocable skills (/csv-query, /data-clean, /data-convert, /data-describe, /data-join, /data-profile, /data-validate, /data-viz, /infer-ontology), model-invoked skills (bls-query, csv-wrangling, data-quality, genai-disclaimer, qsv-performance, reproducible-analysis), and three subagents (data-analyst, data-wrangler, policy-analyst). Installation via dathere/qsv marketplace, workflow examples (profile, CKAN-ready, asof join, hand off to subagent), troubleshooting, QSV_NO_COWORK_SETUP for disabling auto-deployment. MCP-Server: four install modes (Claude Desktop Extension, legacy MCP, Claude Code, Gemini CLI). Tool surface (~60 tools), security model (working dir + allowed dirs), concurrency limits, audit logging (qsvmcp.log vs qsv_rCURRENT.log), QSV_MCP_* env var ecosystem, end-to-end example showing local-only data access with stats summaries sent as LLM context. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

    @jqnatividad jqnatividad committed May 13, 2026
    1e856a5
  • wiki: Phase D batch 2 - Stats-Cache-and-Caching + Lookup-Tables Stats-Cache-and-Caching: complete cache architecture map. Sidecar file layout (.idx, .stats.csv, .stats.csv.data.jsonl, .freq.csv.data.jsonl, .schema.json, .pschema.json), what each command produces vs consumes, QSV_STATSCACHE_MODE auto/force/none, four HTTP cache backends (memory/ disk/Redis/none), TTLs and refresh-on-hit, cache-busting cheatsheet, CI gotchas (commit .stats.jsonl + .pschema.json, don't commit .idx). Lookup-Tables: file://, http(s)://, dathere://, ckan:// URL schemes. qsv_register_lookup in luau (NTA enrichment example), dynamicEnum in validate (against local/HTTP/CKAN/dathere CSVs), register_lookup in template (MiniJinja form letters). Lookup CSV format requirements, qsv-lookup-tables curated repo pointer, caching and performance notes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

    @jqnatividad jqnatividad committed May 13, 2026
    bde89c9
  • wiki: Phase D batch 1 - Performance-Tuning + Environment-Variables Performance-Tuning: the five-minute rule (index, stats cache, schema --polars, ext-*), what gets faster with an index, stats-cache smart commands, multithreading map, memory management with QSV_MEMORY_CHECK, approximate algorithms (DataSketches t-digest + HyperLogLog), build-time optimizations (target-cpu=native, nightly, allocator choice), and a tuning checklist. All numbers link to BENCHMARKS.md. Environment-Variables: grouped overview of 50+ env vars by purpose (I/O, performance, cache, AI/LLM, web, geocoding, stats, dates, regex, color, logging, Polars, MCP). Each links back to the canonical ENVIRONMENT_VARIABLES.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

    @jqnatividad jqnatividad committed May 13, 2026
    8569206
  • wiki: Phase C complete - final 5 cookbook recipes Recipe-Stats-to-Insights: stats -> moarstats -> pragmastat -> describegpt flow on NYC 311 + Allegheny. Heavy-tailed-aware Pragmastat for sale prices, SQL-RAG chat sub-mode for natural-language queries, multilingual outputs, controlled tag vocabularies. Recipe-Diff-and-Audit: 6-step pipeline for weekly regulatory CSVs. BLAKE3 fingerprint gate -> extdedup uniqueness check -> sortcheck/extsort -> diff (<600ms / 1M rows) -> split delta by Add/Remove/Modify -> short- hash lineage. Variations: composite keys, schema validation, larger- than-RAM prerequisites, email summary. Recipe-Build-a-Data-Pipeline: end-to-end on Allegheny property sales - clean -> profile -> validate -> analyze -> report -> publish -> gate. Stage-by-stage walkthrough with Polars schema, sqlp aggregations, pivotp, template-generated Markdown reports, Parquet/Data Package outputs, GitHub Actions CI integration, Make-based orchestration. Recipe-Fetch-and-Cache: NOAA GHCN-Daily weather + GitHub stargazer harvesting with --url-template, --disk-cache, --redis-cache, jaq filters, rate-limit handling. fetchpost for OCR/ML endpoints with MiniJinja templated JSON bodies. Cache management matrix. Recipe-Larger-than-RAM: 27M-row / 16 GB NYC 311 end-to-end without OOM. Index, approx stats with DataSketches, extsort/extdedup, Polars lazy sqlp/joinp, schema --polars, multithreaded snappy, split for parallel chunking, env var tuning matrix (QSV_AUTOINDEX_SIZE, QSV_TMPDIR, QSV_STATS_CHUNK_MEMORY_MB, QSV_MEMORY_CHECK). All 13 Phase C cookbook pages now live. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

    @jqnatividad jqnatividad committed May 13, 2026
    9439ef4
  • wiki: Phase C batch 2 - 4 intermediate cookbook recipes Recipe-Geographic-Enrichment: reverse-geocode NYC 311 lat/lon to city + borough + county + FIPS codes with %dyncols; add high-resolution NTA via Luau lookup tables with fallbacks; ~360k records/sec. Variations: forward geocode, IP lookup, WKT geometry export to GeoJSON, custom Geonames city sets. Recipe-CKAN-Integration: expanded from legacy snippets. Pull metadata from CKAN via ckanapi+jq+qsv jsonl, push CSVs via safenames+applydp+to postgres for DataPusher+, generate Data Packages with embedded stats, describegpt for CKAN data dictionaries with controlled tag vocabularies. Recipe-JSON-Schema-Validate: 6-step workflow on NYC 311 (1M sample): pre-populate stats cache, generate schema, hand-edit to tighten rules, validate at ~780k rec/sec. Deep-dive on three custom keywords: currency, dynamicEnum (file/http/dathere/ckan schemes), uniqueCombinedWith. Variations: --fancy-regex, Polars schema, CI gate, CKAN-hosted enum. Recipe-Multi-Table-Joins: three scenarios in one recipe - lookup join (wcp + country_continent with inner/left/anti/semi), asof time-series join (NYC 311 + NOAA GHCN daily weather with --strategy backward), repeated-column wide merge (original cookbook scenario, updated for current command syntax). Plus non-equi salary bands, cross join, pre-join filtering. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

    @jqnatividad jqnatividad committed May 13, 2026
    752aa42
  • wiki: Phase C batch 1 - Cookbook index + 3 beginner recipes Cookbook index now categorizes 12 recipes by goal (inspect / clean / enrich / validate / aggregate / combine / big files / integration) with anchor dataset and command columns. Legacy snippets (CKAN, Date Enrichment, multi-table join, cat varying columns, geocode) preserved at the bottom of the index with pointers to the expanded Recipe pages. Recipe-Inspect-Unknown-CSV: sniff -> headers -> count -> stats -> frequency -> sample -> table walkthrough on wcp.csv, ~0.7s for full stats on 2.7M rows. Variations: remote sniff, describegpt natural- language summary, colorized output. Recipe-Clean-and-Normalize: 6-step pipeline on Boston 311 covering input --auto-skip, safenames, regex replace of sentinel nulls, group-by fill, sort+dedup with audit trail. Variations: applydp for CKAN, pseudonymization, censoring, schema validation. Recipe-Date-Enrichment: expands the legacy date-enrichment snippets on NYC 311. Adds Year/YearMonth/Weekday/Quarter/TAT columns via datefmt + getquarter.lua + turnaroundtime.lua; partitions output by quarter. Variations: Brooklyn-only TAT, regex date columns, SQL- style aggregation, business-hours filter. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

    @jqnatividad jqnatividad committed May 13, 2026
    268ecf6
  • wiki: complete Phase B - Scripting + Indexing/Compression/Diff + AI/Doc Scripting-Luau-Python covers luau (qsv's flagship DSL with BEGIN/MAIN/END, random-access mode, ~50 qsv helpers), py (Python 3.10+ expressions), foreach (shell-out per row with --unify pattern), and template (MiniJinja rendering with register_lookup). Examples: cumulative running total, NYC 311 classification with dynamicEnum lookup, getquarter.lua from docs/cookbook/lua, helper.py module pattern, per-borough markdown reports. Indexing-Compression-Diff covers index (with QSV_AUTOINDEX_SIZE auto-mode), diff (1M x 9 in <600ms primary-key diff), blake3 (pipeline gating, cache keys, integrity checks), plus cross-references to extsort/extdedup/snappy. AI-and-Documentation covers describegpt (neuro-symbolic data dictionary, Ollama/Jan/LM Studio local LLMs, SQL-RAG sub-mode with DuckDB/Polars, multilingual output, controlled tag vocabulary), color (theme-aware colorized table), and pro (qsv pro API bridge). All 13 Phase B pages now live. All 81 commands covered with non-trivial real-world examples anchored on the wiki's six anchor datasets. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

    @jqnatividad jqnatividad committed May 13, 2026
    591a99c
  • wiki: flesh out Conversion-and-IO + Geospatial + HTTP-and-Web Conversion-and-IO covers excel, to, json, jsonl, tojsonl, snappy, geoconvert (cross-ref), prompt, clipboard (cross-ref). Examples show Excel sheet selection by name/index, CSV -> Parquet with zstd, multi-CSV SQLite/PostgreSQL loads, Frictionless Data Package generation, smart JSONL type inference via the stats cache, multithreaded Snappy. Geospatial covers geocode + geoconvert in depth: forward/reverse geocoding, suggestnow/reversenow one-shot variants, IP lookup, country info, %dyncols output formatting for FIPS code enrichment, GeoJSON/SHP/KML/GPX conversions. Index setup walkthrough. HTTP-and-Web covers fetch + fetchpost with HTTP/2 flow control, RateLimit header awareness, four cache modes (memory/disk/Redis/none), MiniJinja templated POST bodies, GitHub stargazer fetch with --url-template, NOAA GHCN-Daily station data with --disk-cache, and a cache-strategy comparison table. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

    @jqnatividad jqnatividad committed May 13, 2026
    ec74831
  • wiki: flesh out SQL-and-Polars + Validation-and-Schema SQL-and-Polars covers sqlp, scoresql, pivotp, with cross-reference to joinp. Examples: aggregate NYC 311 by Borough x Month -> Parquet, join wcp.csv with country_continent.csv, UNION ALL across yearly exports + window function, dollar-quoting, read_parquet inline, score-before-run warnings and DuckDB plan mode, pivot Allegheny property sales. Validation-and-Schema covers validate, schema, sniff. Highlights the 780k rec/sec validate benchmark, the three custom keywords (currency, dynamicEnum, uniqueCombinedWith), and the schema -> validate workflow loop. Polars schema (.pschema.json) explained as the speed-up feeder for sqlp/joinp/pivotp. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

    @jqnatividad jqnatividad committed May 13, 2026
    0bbbd70
  • wiki: flesh out Aggregation-and-Statistics + Joins-and-Set-Ops Aggregation-and-Statistics covers stats, moarstats, frequency, pragmastat, dedup, extdedup, extsort. Examples emphasize: stats cache for downstream speed (NYC 311), Apache DataSketches approx mode for huge cardinalities, Pragmastat robust statistics for skewed data (Allegheny property sales), extsort/extdedup for files > RAM. Joins-and-Set-Ops covers join, joinp, exclude, partition, split. Examples: wcp + country_continent lookup, NYC 311 + NOAA weather asof join, salary band non-equi join, partition NYC 311 by Borough, chunk 27M-row exports for parallel processing. Both pages: quick decision table, per-command sections with real-world anchored examples, deep-links to /docs/help/, "See also" cross-links. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

    @jqnatividad jqnatividad committed May 13, 2026
    fbd79d8
  • wiki: flesh out Transform-and-Reshape command reference Covers 21 transformation commands with non-trivial real-world examples anchored on wcp.csv, NYC 311, Boston 311, and CKAN dataset metadata. Sections: input, safenames, rename, apply, applydp, datefmt, replace, pseudo, fill, enum, explode, implode, transpose, cat, fixlengths, fmt, behead, edit, sort, sortcheck, reverse. Each command section: 1-paragraph intro, 1-4 example commands, deep-link to /docs/help/<cmd>.md, and "See also" cross-links to related commands and recipes. Quick decision table at top maps tasks to commands. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

    @jqnatividad jqnatividad committed May 13, 2026
    98ac8d0
  • wiki: flesh out Command-Reference index + Selection-and-Inspection Command-Reference is now the canonical index of all 81 commands, grouped by category, with the README legend symbols inline. Every row deep-links to both the wiki category page (workflow context) and /docs/help/<cmd>.md (flag reference). Selection-and-Inspection covers the 12 daily-driver commands with one or two non-trivial real-world examples each, anchored on wcp.csv and NYC 311. Examples verified against /docs/help/ flag tables — no invented flags. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

    @jqnatividad jqnatividad committed May 13, 2026
    85a311d
  • wiki: add stubs for Phase B/C/D/E pages so sidebar links resolve Adds 39 placeholder pages so every sidebar entry resolves to real content rather than a 404. Each stub declares its tier, the phase it will be filled in, and a one-paragraph preview of what's coming. They link back to Home / Getting-Started / Command-Reference / Cookbook for navigation. Pages added: - Phase B (Command Reference, 13): Command-Reference, Selection-and- Inspection, Transform-and-Reshape, Aggregation-and-Statistics, Joins- and-Set-Ops, SQL-and-Polars, Validation-and-Schema, Conversion-and-IO, Geospatial, HTTP-and-Web, Scripting-Luau-Python, Indexing-Compression- Diff, AI-and-Documentation - Phase C (Cookbook recipes, 12): Recipe-Inspect-Unknown-CSV, Recipe- Clean-and-Normalize, Recipe-Geographic-Enrichment, Recipe-Date- Enrichment, Recipe-CKAN-Integration, Recipe-JSON-Schema-Validate, Recipe-Build-a-Data-Pipeline, Recipe-Stats-to-Insights, Recipe-Fetch- and-Cache, Recipe-Larger-than-RAM, Recipe-Diff-and-Audit, Recipe-Multi- Table-Joins - Phase D (Tuning + ecosystem, 8): Performance-Tuning, Environment- Variables, Stats-Cache-and-Caching, Lookup-Tables, Claude-Cowork-Plugin, MCP-Server, qsv-pro-Spotlight, Integrations - Phase E (Polish, 6): Troubleshooting, FAQ, Comparison, Glossary, External-Resources, Contributing-to-the-Wiki Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

    @jqnatividad jqnatividad committed May 13, 2026
    2e394a7
  • wiki: scaffold tiered IA + Phase A foundation pages Adds the new wiki information architecture: Home as a discoverability hub with a "Where things live" table, a tiered sidebar (Beginner/Intermediate/ Advanced), and the 9 foundation pages of Phase A. - Home rewritten as a hub with three CTAs, tier-grouped page lists, and pointers to anchor datasets used across the wiki. - _Sidebar groups pages by Get Started / Command Reference / Cookbook / Tuning / Ecosystem / Reference / Legacy. - _Footer adds universal links (Discussions, qsv pro, benchmarks, license). - Tier-Legend explains the Beginner/Intermediate/Advanced badges. - Installation covers prebuilt binaries, package managers, cargo install, source builds, SIGILL/portable variants, MSI installer, self-update, and signature verification. - Getting-Started walks through count/headers/stats/sample/search/slice/ flatten/select/sort/table/lens/clipboard/index against wcp.csv (verified end-to-end against the actual dataset; outputs match qsv 20.0.0). - Why-qsv is the elevator pitch: speed numbers (with deep-links to BENCHMARKS.md), composability, batteries-included list, ecosystem. - Binary-Variants explains qsv vs qsvlite vs qsvdp vs qsvmcp vs qsvpy with a decision tree. Plan: /Users/joelnatividad/.claude/plans/the-wiki-is-a-typed-dragon.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

    @jqnatividad jqnatividad committed May 13, 2026
    0f4b50d
  • Updated Home (markdown)

    @jqnatividad jqnatividad committed May 10, 2026
    8a49bda
  • Destroyed Supplemental (markdown)

    @jqnatividad jqnatividad committed May 10, 2026
    989ce83
  • Updated Home (markdown)

    @jqnatividad jqnatividad committed May 10, 2026
    d94dcd0
  • Updated Supplemental (markdown)

    @jqnatividad jqnatividad committed Dec 27, 2025
    da6a4e8
  • Updated Supplemental (markdown)

    @jqnatividad jqnatividad committed Dec 5, 2025
    5690c9e
  • Updated Supplemental (markdown)

    @jqnatividad jqnatividad committed Dec 5, 2025
    9ec448e
  • Updated Supplemental (markdown)

    @jqnatividad jqnatividad committed Nov 2, 2025
    0cb1cc5
  • Created describegpt model notes (markdown)

    @jqnatividad jqnatividad committed Sep 8, 2025
    a08d744
  • Destroyed describegpt model note (markdown)

    @jqnatividad jqnatividad committed Sep 8, 2025
    18f8e34
  • Updated Home (markdown)

    @jqnatividad jqnatividad committed Sep 8, 2025
    ecd0996
  • Created describegpt model note (markdown)

    @jqnatividad jqnatividad committed Sep 8, 2025
    ef0adbe
  • Updated Home (markdown)

    @jqnatividad jqnatividad committed Sep 8, 2025
    5f3c2b1