Skip to content
Joel Natividad edited this page May 13, 2026 · 2 revisions

Getting Started

Tier: Beginner

This page walks you through your first ten minutes with qsv using a real 2.7-million-row dataset. You'll learn the daily-driver commands and how to compose them. No prior qsv knowledge is assumed.

If you haven't installed qsv yet, see Installation.

For a deeper tour, follow up with the canonical Whirlwind Tour — it explores the same dataset in much more detail.

Get the data

We'll use wcp.csv — World Cities Population, 2.7 million rows, 124 MB. It ships with this wiki:

# Download and unzip
curl -LO https://raw.githubusercontent.com/wiki/dathere/qsv/files/wcp.zip
unzip wcp.zip       # produces wcp.csv
ls -lh wcp.csv      # 124M

wcp.csv has 7 columns: Country, City, AccentCity, Region, Population, Latitude, Longitude.

See what's there

What columns does this file have? How many rows?

qsv headers wcp.csv
1   Country
2   City
3   AccentCity
4   Region
5   Population
6   Latitude
7   Longitude
qsv count wcp.csv
2699354

That's 2.7 million rows. wcp.csv is a snapshot in time — your row count may differ slightly depending on which version of the dataset you have.

Profile it in seconds

Compute 47 statistics for every column:

qsv stats wcp.csv | qsv table

On an M2 Pro Mac mini, this runs in about 0.7 seconds. With more features turned on (--everything), you get cardinality, antimodes, sortiness, quartiles, and more:

qsv stats --everything wcp.csv > wcp-stats.csv
qsv table wcp-stats.csv

Stats writes a sidecar cache (wcp.stats.csv and wcp.stats.csv.data.jsonl) that many other commands reuse — see Stats Cache & Caching.

Sample randomly

qsv sample --seed 42 10 wcp.csv | qsv table

--seed makes the sample reproducible. sample supports seven sampling methods including stratified, weighted, and cluster — see Selection & Inspection.

Filter rows

Find cities whose name starts with "San" (AccentCity preserves capitalization; City is lowercased):

qsv search --select AccentCity '^San' wcp.csv | qsv count
3155

search accepts a regex. For multiple patterns in a single pass, use searchset. Use -i / --ignore-case for case-insensitive matching.

Pretty-print one row

flatten shows a single record as one column-per-line, which is much easier to read than horizontal CSV:

qsv slice --start 100000 --len 1 wcp.csv | qsv flatten
Country     at
City        oberloiben
AccentCity  Oberloiben
Region      03
Population
Latitude    48.3833333
Longitude   15.5333333

(The actual row at offset 100,000 will depend on your copy of the dataset — wcp.csv is sorted by country then city.)

Pipe everything together

Find the top 10 most-populous US cities, formatted as a pretty table:

qsv search --select Country '^us$' wcp.csv \
  | qsv sort --select Population --numeric --reverse \
  | qsv slice --len 10 \
  | qsv select 'AccentCity,Region,Population' \
  | qsv table
AccentCity    Region  Population
New York      NY      8107916
Los Angeles   CA      3877129
Chicago       IL      2841952
Houston       TX      2027712
Philadelphia  PA      1453268
Phoenix       AZ      1428509
San Diego     CA      1287050
San Antonio   TX      1256810
Dallas        TX      1211704
San Jose      CA      897460

That's the qsv idiom: small, composable commands piped through stdin/stdout, each blazing-fast on its own.

Index for instant random access

Many commands get a 5-10× speedup when an index exists. Create one:

qsv index wcp.csv
ls -lh wcp.csv*
124M  wcp.csv
 21M  wcp.csv.idx

Now count, sample, slice are instantaneous; stats, frequency, split, and schema become multithreaded. Read more in Indexing, Compression & Diff and Performance Tuning.

Pipe to your clipboard

qsv stats wcp.csv | qsv table | qsv clipboard --save

Paste anywhere. (clipboard is part of the UI feature group — --save (-s) writes stdin to your OS clipboard; calling qsv clipboard with no flag reads from the clipboard. See /docs/help/clipboard.md.)

Interactively browse

For a tui-style interactive viewer with search, filter, and column toggling:

qsv lens wcp.csv

lens uses the csvlens engine and also opens Parquet, Arrow, JSONL, and Avro files when the polars feature is enabled.

What just happened?

You used count, headers, stats, sample, search, slice, flatten, select, sort, table, lens, clipboard, index — 12 of qsv's 70+ commands. Every one composes via stdin/stdout, every one is multithreaded where it makes sense, and every result is cached on disk if useful.

Next steps

If you want to… Go to
Build a mental model of every command Command Reference (index)
Solve a specific problem (clean, validate, enrich, …) Cookbook
Understand the speed numbers Why qsv? and Performance Tuning
Walk through wcp.csv in more depth docs/whirlwind_tour.md
Practice with guided exercises 100.dathere.com
Hit an issue Troubleshooting

See also

Clone this wiki locally