-
Notifications
You must be signed in to change notification settings - Fork 105
Comparison
Tier: Intermediate
A short, honest comparison of qsv against neighboring tools. The deep numbers live in docs/BENCHMARKS.md and the live dashboard at qsv.dathere.com/benchmarks. All performance claims on this page link to those sources rather than embedding numbers that will go stale.
See also the original xsv 0.13.0
statscompared with qsv 1.0.0statswiki page for a side-by-side example of the stats-output expansion.
xsv is the original Rust CSV tool from BurntSushi. qsv is a maintained, multithreaded fork that adds many commands and features. xsv has been on minimal-maintenance status since ~2019.
| xsv 0.13 | qsv 20+ | |
|---|---|---|
| Commands | ~13 | 70+ |
| Multithreaded | No | Many commands (🚀 / 🏎️) |
| Polars-powered SQL | No |
sqlp, joinp, pivotp, scoresql
|
| JSON Schema validation | No |
validate (with custom keywords) |
| Geocoding | No | geocode |
| HTTP fetching | No |
fetch, fetchpost with caching |
| AI / LLM integration | No |
describegpt, MCP server, Cowork plugin |
| Embedded scripting DSL | No | Luau and Python |
| External-* commands | No |
extsort, extdedup
|
| Apache DataSketches | No |
--cardinality-method approx, --quantile-method approx, frequency --sketch-method frequent_items
|
stats output columns |
12 (default), 14 (--everything) |
37 (default), 47 (--everything) — see legacy wiki page
|
| Active development | Minimal | Active |
Migration path: install qsvlite — it's the xsv-compatible subset, with the same flags and command set. Or install full qsv for everything.
csvkit is a Python CSV toolkit (csvstack, csvgrep, csvjoin, csvstat, …) with long history.
Speed: qsv outperforms csvkit by ~10× on typical workloads (compiled Rust + multithreading vs Python). See docs/BENCHMARKS.md for the methodology.
Surface area:
- csvkit has tighter integration with the Python ecosystem (pip-installable, extensible in Python).
- qsv has more commands (geocoding, fetch, validate with custom keywords, describegpt, Polars SQL, …).
- csvkit is one project; qsv is the engine plus an ecosystem (qsv pro, MCP server, Cowork plugin, qsv-recipes, qsv-lookup-tables, DataPusher+).
When to pick which:
- Inside a Python project where you already use pandas — csvkit might fit better.
- For shell pipelines, CI gates, or large files — qsv wins decisively.
- The two can coexist; many users use csvkit's
csvstatthen pipe results into a downstream qsv command.
Miller is C, fast, and shape-agnostic — it handles CSV, TSV, JSON, JSONL, DKVP, PPRINT, NIDX. qsv is CSV-specialized with deeper stats and validation.
Speed: comparable for streaming row ops. qsv pulls ahead on aggregations, joins, and stats due to multithreading.
Where Miller shines:
- DKVP and nested-JSON inputs.
- Compact DSL that does row transformations and filtering in one expression.
- Long-standing maturity.
Where qsv shines:
- 47-metric stats with guaranteed type inference.
- JSON Schema validation at 780k rows/sec.
- Polars-powered SQL and asof joins.
- Integrated AI workflows.
- MCP server / Cowork plugin / qsv pro ecosystem.
Both are excellent. Many shell power-users keep both installed and reach for whichever is faster for the task at hand.
DuckDB is an embedded analytical SQL database. Its read_csv table function is highly optimized.
Different jobs:
- DuckDB excels at multi-CSV SQL analytics with a full query optimizer and OLAP execution engine.
- qsv excels at the pre-DB cleaning, profiling, validation, and enrichment layer.
The recommended pattern: use qsv for cleaning + Parquet conversion, then DuckDB for analytics:
qsv stats --stats-jsonl raw.csv
qsv schema --polars raw.csv
qsv to parquet outdir/ raw.csv
duckdb -c "SELECT * FROM read_parquet('outdir/raw.parquet') WHERE ..."qsv also integrates with DuckDB directly:
-
qsv sqlpruns Polars SQL; the closest spiritual cousin toduckdb -c "..."for CSVs. -
qsv scoresql --duckdbuses DuckDB's planner to score a query before running it. -
qsv describegptwithQSV_DUCKDB_PATHuses DuckDB for SQL-RAG.
pandas is the Python data-analysis workhorse. qsv complements pandas; it doesn't replace it.
- pandas is in-memory, Python-native, and excels at ad-hoc analysis with charts and ML.
- qsv is streaming (mostly), shell-native, and excels at fast profiling, validation, transformations on multi-GB files, and pre-DB cleaning.
For a 2.7M-row CSV, qsv stats runs in well under a second; pd.read_csv(...).describe() typically takes 10+ seconds. For inline charts or train_test_split, pandas / scikit-learn are obviously the right tools.
Use qsv from notebooks via subprocess:
import subprocess
subprocess.run(['qsv', 'stats', '--everything', '--stats-jsonl', 'data.csv'], check=True)
stats_df = pd.read_csv('data.stats.csv')See Integrations → Python notebooks.
Visidata is a terminal UI for tabular data exploration — closer to qsv lens than to the rest of qsv. Both are excellent in their niche.
- Visidata is interactive (TUI). Sift, filter, pivot, sort visually.
- qsv lens is interactive too, via csvlens.
- The rest of qsv is non-interactive — fast batch operations from the shell.
Use Visidata for exploratory analysis; use qsv for the rest of the pipeline.
awk and sed are general-purpose text tools. They don't understand CSV quoting — embedded commas, multi-line quoted fields, and escaped quotes will trip them up.
Use qsv for any CSV operation. Use awk / sed for plain-text logs and configuration files.
| You want to… | Reach for |
|---|---|
| Profile / validate / clean a CSV | qsv |
| Multi-file SQL analytics | DuckDB + qsv-prepared Parquet |
| Notebook-driven exploratory ML | pandas + qsv subprocess for heavy lifting |
| Shape-agnostic stream processing (JSON, DKVP, …) | Miller |
| Interactive TUI exploration | qsv lens or Visidata |
| Drag-and-drop GUI exploration | qsv pro |
| AI-assisted analysis | qsv MCP Server + Claude Cowork Plugin |
| xsv-compatible drop-in | qsvlite |
-
docs/BENCHMARKS.md— canonical reference - qsv.dathere.com/benchmarks — live dashboard
- xsv 0.13.0 stats vs qsv 1.0.0 stats — legacy wiki page
- Why qsv? — the affirmative case
-
Binary Variants —
qsvlitefor xsv migrants - Integrations — pairing qsv with adjacent tools
qsv — GitHub · Releases · Discussions · qsv pro · Try it online · Benchmarks · datHere · DeepWiki · Dual-licensed MIT / Unlicense
Edit this page: Contributing to the Wiki
Home · Why qsv? · Tier legend
- All Commands (index)
- Selection & Inspection
- Transform & Reshape
- Aggregation & Statistics
- Joins & Set Ops
- SQL & Polars
- Validation & Schema
- Conversion & I/O
- Geospatial
- HTTP & Web
- Scripting (Luau / Python)
- Indexing, Compression & Diff
- AI & Documentation