Cookbook

Task-oriented recipes. Each recipe follows the same shape: Problem → Data → Solution → Variations → Performance notes → See also. Pick by goal:

Recipe index

Inspect & profile

Recipe	Tier	Anchor dataset	Commands used
Inspect an Unknown CSV	Beginner	`wcp.csv`	`sniff`, `headers`, `count`, `stats`, `frequency`, `sample`, `table`

Clean & normalize

Recipe	Tier	Anchor dataset	Commands used
Clean & Normalize	Beginner	Boston 311	`input`, `safenames`, `replace`, `fill`, `dedup`

Enrich

Recipe	Tier	Anchor dataset	Commands used
Geographic Enrichment	Intermediate	NYC 311	`geocode`, `luau`, lookup tables
Date Enrichment	Intermediate	NYC 311	`datefmt`, `luau`, `partition`
Fetch & Cache	Advanced	NOAA GHCN, GitHub stargazers	`fetch`, `--disk-cache`, `--jaq`

Validate

Recipe	Tier	Anchor dataset	Commands used
JSON Schema Validation	Intermediate	NYC 311 (1M sample)	`schema`, `validate` (custom keywords)

Aggregate & analyze

Recipe	Tier	Anchor dataset	Commands used
Stats → Insights	Intermediate	NYC 311	`stats`, `moarstats`, `pragmastat`, `describegpt`
Build a Data Pipeline	Advanced	Allegheny property sales	`safenames`, `schema`, `validate`, `sqlp`, `pivotp`, `template`

Combine & compare

Recipe	Tier	Anchor dataset	Commands used
Multi-Table Joins	Intermediate	NYC 311 + NOAA	`join`, `joinp` (asof)
Diff & Audit	Intermediate	Weekly regulatory CSVs	`sortcheck`, `extsort`, `diff`, `blake3`

Big files

Recipe	Tier	Anchor dataset	Commands used
Larger-than-RAM CSV	Advanced	NYC 311 (16 GB / 27M rows)	`index`, `extsort`, `sqlp`, `extdedup`, jemalloc tuning

Integration

Recipe	Tier	Anchor dataset	Commands used
CKAN Integration	Intermediate	any CKAN portal	`safenames`, `applydp`, `to postgres`, DataPusher+

For a per-command reference (vs. per-task), see the Command Reference.

Anchor datasets

Examples in the recipes use these consistently:

wcp.csv — World Cities Population, 2.7M rows / 124 MB. files/wcp.zip.
NYC 311 Service Requests — 1M sample at files/nyc311samp.csv; full 16 GB / 27M from NYC Open Data; bundled in resources/test/NYC_311_SR_2010-2020-sample-1M.csv.
Boston 311 — small fixtures in qsv's resources/test/ (also .gz, .zst, .parquet).
Allegheny County property sales — resources/test/allegheny_property_sales*.csv with pre-built .idx and .stats.csv.
NOAA GHCN-Daily — fetched via qsv fetch from https://www.ncei.noaa.gov/data/global-historical-climatology-network-daily/.
GitHub stargazers — https://api.github.com/repos/dathere/qsv/stargazers.
country_continent.csv — small lookup, used in joins.

Contributing

Have a recipe to share? See Contributing to the Wiki for how to add one. Recipes are also welcome in the main qsv CONTRIBUTING.md.

Legacy recipes (preserved)

These are the original wiki Cookbook recipes from before the Phase C expansion. Each is being expanded into a dedicated Recipe page with the template above — links at the top of this index. Originals are kept here as a backup until each replacement is verified.

CKAN (legacy) — see Recipe: CKAN Integration

using qsv, ckanapi, jq and xargs.

get a CSV of datasets/users/groups/orgs in a CKAN instance

ckanapi -r https://demo.ckan.org dump datasets --all | qsv jsonl > datasets.csv
ckanapi -r https://demo.ckan.org dump users --all | qsv jsonl > users.csv
ckanapi -r https://demo.ckan.org dump groups --all | qsv jsonl > groups.csv
ckanapi -r https://demo.ckan.org dump organizations --all | qsv jsonl > organizations.csv

get a CSV of resources for a given dataset

ckanapi -r https://catalog.data.gov action package_show \
id=low-altitude-aerial-imagery-obtained-with-unmanned-aerial-systems-uas-flights-over-black-beach \
| jq -c '.resources[]' \
| qsv jsonl \
> resources.csv

get the latest version of a CSV resource for wellstar-oil-and-gas-wells1 on https://data.cnra.ca.gov, run stats on it

ckanapi -r https://data.cnra.ca.gov action package_show id="wellstar-oil-and-gas-wells1" \
> wellstar-oil-and-gas-wells.json
cat wellstar-oil-and-gas-wells.json \
| jq -c '.resources[] | select(.name=="CSV") | .url' \
| xargs -L 1 wget -O wellstar.csv
qsv stats --everything wellstar.csv > wellstar-stats.csv

Date Enrichment (legacy) — see Recipe: Date Enrichment

(Snippet kept for back-compat with old external links. See the new recipe page for an expanded version using current command syntax.)

qsv luau map Quarter -x -f getquarter.lua nyc311samp.csv > result-qtr.csv
qsv partition Quarter nyc311byqtr result-qtr.csv
qsv datefmt "Created Date" nyc311samp.csv > result-iso8601.csv
qsv datefmt "Created Date" --formatstr "%a %b %e %T %Y %z" nyc311samp.csv > result-datefmt.csv
qsv datefmt "Created Date" --formatstr "%Y" --new-column "Created Year" nyc311samp.csv > result-year.csv
qsv luau map TAT -x -f turnaroundtime.lua nyc311samp.csv > result-tat.csv

Files: nyc311samp.csv, getquarter.lua, turnaroundtime.lua.

Multi-table join avoiding repeated columns (legacy) — see Recipe: Multi-Table Joins

cp table_A.csv combined.csv
for NEXT in table_B.csv table_C.csv table_D.csv; do
  qsv join 2 combined.csv 1 <(qsv select 2,11- $NEXT) > new.csv
  mv new.csv combined.csv
done

`qsv cat rows` of many CSVs with varying columns (legacy)

CSVs from a 3rd party are unreliable. Over time, the header formatting may change, a header may get added, or removed. The following script concatenates hundreds of CSV files that share the same basic set of headers, with marginal differences:

#!/bin/bash
# bash scripts/fix.sh ./fixed "*.csv"
DIST=$1
TARGETS=$2

mkdir -p $DIST

# Step 1: collect the unique header names
unique_headers_list=$(qsv headers -j --union $TARGETS | tr '[:upper:]' '[:lower:]' | sort -u)
unique_headers_comma=$(echo $unique_headers_list | tr ' ' ',')

# Step 2: For each CSV file, add missing headers with default values
FILES=($TARGETS)
for file in "${FILES[@]}"; do
  FILENAME=${file##*/}
  OUT="$DIST/$FILENAME"
  qsv input $file | awk 'NR==1 {print tolower($0)} NR>1 {print}' > $OUT.lowercase
  mv $OUT.lowercase $OUT
  MISSING_HEADERS=$(echo -e "$(qsv headers -j $OUT)\n$unique_headers_list" | sort | uniq -u | tr '\n' ',' | sed 's/,$//' | sed 's/^,//')
  echo $MISSING_HEADERS | qsv cat columns -o $OUT.full -p $OUT -
  mv $OUT.full $OUT
  qsv select $unique_headers_comma -o $OUT.ordered $OUT
  mv $OUT.ordered $OUT
done

Note: the original recipe used --intersect. In qsv 19+ the flag was renamed to --union since it always computed a union. Updated above.

Geocoding (legacy) — see Recipe: Geographic Enrichment

Old qsv apply geocode syntax has been superseded by the dedicated geocode command.

# Old (still works, but prefer the new geocode command):
qsv apply geocode Location --new-column City nyc311samp.csv > result-geocoded.csv

# New:
qsv geocode reverse Location --new-column City --formatstr "%name" --country US nyc311samp.csv > result-geocoded.csv

Cookbook

Cookbook

Recipe index

Inspect & profile

Clean & normalize

Enrich

Validate

Aggregate & analyze

Combine & compare

Big files

Integration

Anchor datasets

Contributing

Legacy recipes (preserved)

CKAN (legacy) — see Recipe: CKAN Integration

Date Enrichment (legacy) — see Recipe: Date Enrichment

Multi-table join avoiding repeated columns (legacy) — see Recipe: Multi-Table Joins

qsv cat rows of many CSVs with varying columns (legacy)

Geocoding (legacy) — see Recipe: Geographic Enrichment

See also

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Get Started

Command Reference

Cookbook

Tuning & Internals

Ecosystem

Reference

Legacy

Clone this wiki locally

`qsv cat rows` of many CSVs with varying columns (legacy)