Skip to content
Joel Natividad edited this page May 13, 2026 · 2 revisions

Integrations

Tier: Intermediate

qsv lives inside larger pipelines. This page maps the major integration surfaces — pick what's relevant to your stack.

CKAN / DataPusher+

CKAN is the de-facto open-source data portal. qsv is a first-class CKAN citizen via:

  • safenames — produces CKAN Datastore-safe column names
  • applydp — slim transform operations for the DataPusher+ Postgres COPY pipeline
  • qsvdp binary variant — the slim build DataPusher+ ships
  • to postgres — bulk load into the CKAN Datastore's underlying PostgreSQL
  • dynamicEnum with ckan:// URLs — validate column values against any CKAN-hosted reference CSV
  • sniff — health-check remote CKAN resources without downloading

See Recipe: CKAN Integration for the full pipeline.

DuckDB

qsv complements DuckDB:

  • qsv to parquet + DuckDB = an excellent CSV → analytics pipeline. qsv cleans and converts; DuckDB queries.
  • qsv sqlp uses Polars SQL; qsv scoresql --duckdb uses DuckDB's planner for query scoring.
  • qsv describegpt with QSV_DUCKDB_PATH set uses DuckDB for SQL-RAG (highly recommended).

Typical pipeline:

qsv stats --stats-jsonl raw.csv             # build stats cache
qsv schema --polars raw.csv                 # build Polars schema
qsv to parquet outdir/ raw.csv              # convert to Parquet

duckdb -c "SELECT borough, COUNT(*) FROM read_parquet('outdir/raw.parquet') GROUP BY borough"

For more on DuckDB integration, see docs/help/sqlp.md and docs/help/scoresql.md.

Python notebooks

qsv ships sample Jupyter / Colab notebooks in contrib/notebooks/:

The pattern is to use qsv via shell subprocess from a notebook cell, or via subprocess.run. qsv handles the heavy data work; pandas / Polars / matplotlib handle plotting and modeling.

import subprocess, pandas as pd
result = subprocess.run(
    ['qsv', 'stats', '--everything', '--stats-jsonl', 'data.csv'],
    capture_output=True, text=True, check=True
)
stats_df = pd.read_csv('data.stats.csv')   # qsv wrote the sidecar

CI/CD — GitHub Actions

qsv works great as a data-quality gate. Drop a step into your workflow:

# .github/workflows/data-quality.yml
name: Data quality
on:
  pull_request:
    paths: ['data/**.csv']
jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Install qsv
        run: |
          curl -L -o qsv.zip \
            https://github.com/dathere/qsv/releases/latest/download/qsv-1.0.0-x86_64-unknown-linux-gnu.zip
          unzip qsv.zip
          sudo install qsv /usr/local/bin/
      - name: Validate
        run: |
          qsv validate data/customers.csv data/customers.schema.json
          if [ -f data/customers.csv.invalid.csv ]; then
            echo "::error::Validation failed; see data/customers.csv.validation-errors.tsv"
            exit 1
          fi
      - name: Diff against last release
        run: |
          qsv diff --select id last_release/data.csv data/data.csv > delta.csv
          qsv count delta.csv

See Recipe: JSON Schema Validation and Recipe: Diff & Audit for patterns you can drop into CI.

qsv-recipes — community Luau scripts

qsv-recipes is a curated repo of Luau scripts for qsv luau:

  • ISBN validation
  • Unemployment rate enrichment
  • Stemming / text normalization
  • Geographic enrichment
  • Time-series transforms
  • … and many more

Use it with qsv luau map -x -f .... See Scripting (Luau / Python) and Recipe: Date Enrichment.

qsv-lookup-tables — community reference CSVs

qsv-lookup-tables is the curated reference-data repo. Access from Luau / template / validate with the dathere:// URL scheme:

qsv_register_lookup("us_states", "dathere://us-states-example.csv")

See Lookup Tables.

qsv-cookiecutter

qsv-cookiecutter is a Cookiecutter project scaffold for qsv-based data pipelines:

pipx install cookiecutter
cookiecutter gh:dathere/qsv-cookiecutter

You get a templated project with directory layout, Makefile, baseline shell scripts, and an analytics/ folder pre-wired for qsv.

Claude / LLMs

Three integration shapes:

  1. qsv MCP Server — qsv as an MCP server for any MCP-aware client. See MCP Server.
  2. Claude Cowork Plugin — 15 skills + 3 agents on top of the MCP server, for Claude Desktop. See Claude Cowork Plugin.
  3. qsv describegpt — qsv calls any OpenAI-compatible LLM directly (including Ollama / LM Studio / Jan local LLMs). See AI & Documentation.

Spreadsheets (Excel / LibreOffice)

  • qsv excel — Excel/ODS sheet → CSV.
  • qsv to xlsx — CSV(s) → Excel workbook (one sheet per CSV).
  • qsv to ods — CSV(s) → LibreOffice Calc.
  • MCP server auto-converts Excel inputs to CSV before running qsv commands (transparent to the LLM).

See Conversion & I/O.

Geospatial tooling

  • qsv geoconvert — CSV ↔ GeoJSON / SHP / KML / GPX.
  • qsv geocode — local Geonames + MaxMind GeoLite2 lookups (360k records/sec).
  • Pair with QGIS, GeoPandas, PostGIS, or kepler.gl for visualization.

See Geospatial and Recipe: Geographic Enrichment.

Shell pipelines

qsv reads/writes stdin/stdout. Pair freely with:

  • jq / jaq — JSON manipulation (qsv ships jaq as a built-in via --jaq)
  • xargs -P — parallelize over qsv-produced row lists
  • curl / wget — feed remote files (or use qsv fetch / sniff directly)
  • awk / sed — when qsv's regex isn't enough (rare)
  • psql / sqlite3 — for the database hop

See also

Clone this wiki locally