-
Notifications
You must be signed in to change notification settings - Fork 105
Getting Started
Tier: Beginner
This page walks you through your first ten minutes with qsv using a real 2.7-million-row dataset. You'll learn the daily-driver commands and how to compose them. No prior qsv knowledge is assumed.
If you haven't installed qsv yet, see Installation.
For a deeper tour, follow up with the canonical Whirlwind Tour — it explores the same dataset in much more detail.
We'll use wcp.csv — World Cities Population, 2.7 million rows, 124 MB. It ships with this wiki:
# Download and unzip
curl -LO https://raw.githubusercontent.com/wiki/dathere/qsv/files/wcp.zip
unzip wcp.zip # produces wcp.csv
ls -lh wcp.csv # 124Mwcp.csv has 7 columns: Country, City, AccentCity, Region, Population, Latitude, Longitude.
What columns does this file have? How many rows?
qsv headers wcp.csv1 Country
2 City
3 AccentCity
4 Region
5 Population
6 Latitude
7 Longitude
qsv count wcp.csv2699354
That's 2.7 million rows. wcp.csv is a snapshot in time — your row count may differ slightly depending on which version of the dataset you have.
Compute 47 statistics for every column:
qsv stats wcp.csv | qsv tableOn an M2 Pro Mac mini, this runs in about 0.7 seconds. With more features turned on (--everything), you get cardinality, antimodes, sortiness, quartiles, and more:
qsv stats --everything wcp.csv > wcp-stats.csv
qsv table wcp-stats.csvStats writes a sidecar cache (wcp.stats.csv and wcp.stats.csv.data.jsonl) that many other commands reuse — see Stats Cache & Caching.
qsv sample --seed 42 10 wcp.csv | qsv table--seed makes the sample reproducible. sample supports seven sampling methods including stratified, weighted, and cluster — see Selection & Inspection.
Find cities whose name starts with "San" (AccentCity preserves capitalization; City is lowercased):
qsv search --select AccentCity '^San' wcp.csv | qsv count3155
search accepts a regex. For multiple patterns in a single pass, use searchset. Use -i / --ignore-case for case-insensitive matching.
flatten shows a single record as one column-per-line, which is much easier to read than horizontal CSV:
qsv slice --start 100000 --len 1 wcp.csv | qsv flattenCountry at
City oberloiben
AccentCity Oberloiben
Region 03
Population
Latitude 48.3833333
Longitude 15.5333333
(The actual row at offset 100,000 will depend on your copy of the dataset — wcp.csv is sorted by country then city.)
Find the top 10 most-populous US cities, formatted as a pretty table:
qsv search --select Country '^us$' wcp.csv \
| qsv sort --select Population --numeric --reverse \
| qsv slice --len 10 \
| qsv select 'AccentCity,Region,Population' \
| qsv tableAccentCity Region Population
New York NY 8107916
Los Angeles CA 3877129
Chicago IL 2841952
Houston TX 2027712
Philadelphia PA 1453268
Phoenix AZ 1428509
San Diego CA 1287050
San Antonio TX 1256810
Dallas TX 1211704
San Jose CA 897460
That's the qsv idiom: small, composable commands piped through stdin/stdout, each blazing-fast on its own.
Many commands get a 5-10× speedup when an index exists. Create one:
qsv index wcp.csv
ls -lh wcp.csv*124M wcp.csv
21M wcp.csv.idx
Now count, sample, slice are instantaneous; stats, frequency, split, and schema become multithreaded. Read more in Indexing, Compression & Diff and Performance Tuning.
qsv stats wcp.csv | qsv table | qsv clipboard --savePaste anywhere. (clipboard is part of the UI feature group — --save (-s) writes stdin to your OS clipboard; calling qsv clipboard with no flag reads from the clipboard. See /docs/help/clipboard.md.)
For a tui-style interactive viewer with search, filter, and column toggling:
qsv lens wcp.csvlens uses the csvlens engine and also opens Parquet, Arrow, JSONL, and Avro files when the polars feature is enabled.
You used count, headers, stats, sample, search, slice, flatten, select, sort, table, lens, clipboard, index — 12 of qsv's 70+ commands. Every one composes via stdin/stdout, every one is multithreaded where it makes sense, and every result is cached on disk if useful.
| If you want to… | Go to |
|---|---|
| Build a mental model of every command | Command Reference (index) |
| Solve a specific problem (clean, validate, enrich, …) | Cookbook |
| Understand the speed numbers | Why qsv? and Performance Tuning |
Walk through wcp.csv in more depth |
docs/whirlwind_tour.md |
| Practice with guided exercises | 100.dathere.com |
| Hit an issue | Troubleshooting |
- Whirlwind Tour — the canonical hands-on guide
- Selection & Inspection — every command you used above, in depth
- Cookbook → Inspect an Unknown CSV — when you don't know what's in your file
- Why qsv? — the elevator pitch
- Lessons & Exercises
qsv — GitHub · Releases · Discussions · qsv pro · Try it online · Benchmarks · datHere · DeepWiki · Dual-licensed MIT / Unlicense
Edit this page: Contributing to the Wiki
Home · Why qsv? · Tier legend
- All Commands (index)
- Selection & Inspection
- Transform & Reshape
- Aggregation & Statistics
- Joins & Set Ops
- SQL & Polars
- Validation & Schema
- Conversion & I/O
- Geospatial
- HTTP & Web
- Scripting (Luau / Python)
- Indexing, Compression & Diff
- AI & Documentation