-
Notifications
You must be signed in to change notification settings - Fork 105
Integrations
Tier: Intermediate
qsv lives inside larger pipelines. This page maps the major integration surfaces — pick what's relevant to your stack.
CKAN is the de-facto open-source data portal. qsv is a first-class CKAN citizen via:
-
safenames— produces CKAN Datastore-safe column names -
applydp— slim transform operations for the DataPusher+ Postgres COPY pipeline -
qsvdpbinary variant — the slim build DataPusher+ ships -
to postgres— bulk load into the CKAN Datastore's underlying PostgreSQL -
dynamicEnumwithckan://URLs — validate column values against any CKAN-hosted reference CSV -
sniff— health-check remote CKAN resources without downloading
See Recipe: CKAN Integration for the full pipeline.
qsv complements DuckDB:
-
qsv to parquet+ DuckDB = an excellent CSV → analytics pipeline. qsv cleans and converts; DuckDB queries. -
qsv sqlpuses Polars SQL;qsv scoresql --duckdbuses DuckDB's planner for query scoring. -
qsv describegptwithQSV_DUCKDB_PATHset uses DuckDB for SQL-RAG (highly recommended).
Typical pipeline:
qsv stats --stats-jsonl raw.csv # build stats cache
qsv schema --polars raw.csv # build Polars schema
qsv to parquet outdir/ raw.csv # convert to Parquet
duckdb -c "SELECT borough, COUNT(*) FROM read_parquet('outdir/raw.parquet') GROUP BY borough"For more on DuckDB integration, see docs/help/sqlp.md and docs/help/scoresql.md.
qsv ships sample Jupyter / Colab notebooks in contrib/notebooks/:
- qsv-colab-quickstart.ipynb — Google Colab walkthrough
- Whirlwindtour.ipynb — the canonical whirlwind tour as a notebook
- intro-to-count.ipynb — beginner-focused
- qsv_benchmark.ipynb — reproducible benchmark setup
The pattern is to use qsv via shell subprocess from a notebook cell, or via subprocess.run. qsv handles the heavy data work; pandas / Polars / matplotlib handle plotting and modeling.
import subprocess, pandas as pd
result = subprocess.run(
['qsv', 'stats', '--everything', '--stats-jsonl', 'data.csv'],
capture_output=True, text=True, check=True
)
stats_df = pd.read_csv('data.stats.csv') # qsv wrote the sidecarqsv works great as a data-quality gate. Drop a step into your workflow:
# .github/workflows/data-quality.yml
name: Data quality
on:
pull_request:
paths: ['data/**.csv']
jobs:
validate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Install qsv
run: |
curl -L -o qsv.zip \
https://github.com/dathere/qsv/releases/latest/download/qsv-1.0.0-x86_64-unknown-linux-gnu.zip
unzip qsv.zip
sudo install qsv /usr/local/bin/
- name: Validate
run: |
qsv validate data/customers.csv data/customers.schema.json
if [ -f data/customers.csv.invalid.csv ]; then
echo "::error::Validation failed; see data/customers.csv.validation-errors.tsv"
exit 1
fi
- name: Diff against last release
run: |
qsv diff --select id last_release/data.csv data/data.csv > delta.csv
qsv count delta.csvSee Recipe: JSON Schema Validation and Recipe: Diff & Audit for patterns you can drop into CI.
qsv-recipes is a curated repo of Luau scripts for qsv luau:
- ISBN validation
- Unemployment rate enrichment
- Stemming / text normalization
- Geographic enrichment
- Time-series transforms
- … and many more
Use it with qsv luau map -x -f .... See Scripting (Luau / Python) and Recipe: Date Enrichment.
qsv-lookup-tables is the curated reference-data repo. Access from Luau / template / validate with the dathere:// URL scheme:
qsv_register_lookup("us_states", "dathere://us-states-example.csv")See Lookup Tables.
qsv-cookiecutter is a Cookiecutter project scaffold for qsv-based data pipelines:
pipx install cookiecutter
cookiecutter gh:dathere/qsv-cookiecutterYou get a templated project with directory layout, Makefile, baseline shell scripts, and an analytics/ folder pre-wired for qsv.
Three integration shapes:
- qsv MCP Server — qsv as an MCP server for any MCP-aware client. See MCP Server.
- Claude Cowork Plugin — 15 skills + 3 agents on top of the MCP server, for Claude Desktop. See Claude Cowork Plugin.
-
qsv describegpt— qsv calls any OpenAI-compatible LLM directly (including Ollama / LM Studio / Jan local LLMs). See AI & Documentation.
-
qsv excel— Excel/ODS sheet → CSV. -
qsv to xlsx— CSV(s) → Excel workbook (one sheet per CSV). -
qsv to ods— CSV(s) → LibreOffice Calc. - MCP server auto-converts Excel inputs to CSV before running qsv commands (transparent to the LLM).
See Conversion & I/O.
-
qsv geoconvert— CSV ↔ GeoJSON / SHP / KML / GPX. -
qsv geocode— local Geonames + MaxMind GeoLite2 lookups (360k records/sec). - Pair with QGIS, GeoPandas, PostGIS, or kepler.gl for visualization.
See Geospatial and Recipe: Geographic Enrichment.
qsv reads/writes stdin/stdout. Pair freely with:
-
jq/jaq— JSON manipulation (qsv shipsjaqas a built-in via--jaq) -
xargs -P— parallelize over qsv-produced row lists -
curl/wget— feed remote files (or useqsv fetch/sniffdirectly) -
awk/sed— when qsv's regex isn't enough (rare) -
psql/sqlite3— for the database hop
- Recipe: CKAN Integration — DataPusher+ pipeline
- Recipe: Build a Data Pipeline — Make-driven orchestration
- Recipe: Fetch & Cache — HTTP API patterns
- Claude Cowork Plugin
- MCP Server
- qsv pro Spotlight
- Lookup Tables
- Scripting (Luau / Python)
- Conversion & I/O
- Geospatial
- External Resources
qsv — GitHub · Releases · Discussions · qsv pro · Try it online · Benchmarks · datHere · DeepWiki · Dual-licensed MIT / Unlicense
Edit this page: Contributing to the Wiki
Home · Why qsv? · Tier legend
- All Commands (index)
- Selection & Inspection
- Transform & Reshape
- Aggregation & Statistics
- Joins & Set Ops
- SQL & Polars
- Validation & Schema
- Conversion & I/O
- Geospatial
- HTTP & Web
- Scripting (Luau / Python)
- Indexing, Compression & Diff
- AI & Documentation