Skip to content
Joel Natividad edited this page May 13, 2026 · 3 revisions

Comparison vs other CSV tools

Tier: Intermediate

A short, honest comparison of qsv against neighboring tools. The deep numbers live in docs/BENCHMARKS.md and the live dashboard at qsv.dathere.com/benchmarks. All performance claims on this page link to those sources rather than embedding numbers that will go stale.

See also the original xsv 0.13.0 stats compared with qsv 1.0.0 stats wiki page for a side-by-side example of the stats-output expansion.

qsv vs xsv

xsv is the original Rust CSV tool from BurntSushi. qsv is a maintained, multithreaded fork that adds many commands and features. xsv has been on minimal-maintenance status since ~2019.

xsv 0.13 qsv 20+
Commands ~13 70+
Multithreaded No Many commands (🚀 / 🏎️)
Polars-powered SQL No sqlp, joinp, pivotp, scoresql
JSON Schema validation No validate (with custom keywords)
Geocoding No geocode
HTTP fetching No fetch, fetchpost with caching
AI / LLM integration No describegpt, MCP server, Cowork plugin
Embedded scripting DSL No Luau and Python
External-* commands No extsort, extdedup
Apache DataSketches No --cardinality-method approx, --quantile-method approx, frequency --sketch-method frequent_items
stats output columns 12 (default), 14 (--everything) 37 (default), 47 (--everything) — see legacy wiki page
Active development Minimal Active

Migration path: install qsvlite — it's the xsv-compatible subset, with the same flags and command set. Or install full qsv for everything.

qsv vs csvkit

csvkit is a Python CSV toolkit (csvstack, csvgrep, csvjoin, csvstat, …) with long history.

Speed: qsv outperforms csvkit by ~10× on typical workloads (compiled Rust + multithreading vs Python). See docs/BENCHMARKS.md for the methodology.

Surface area:

  • csvkit has tighter integration with the Python ecosystem (pip-installable, extensible in Python).
  • qsv has more commands (geocoding, fetch, validate with custom keywords, describegpt, Polars SQL, …).
  • csvkit is one project; qsv is the engine plus an ecosystem (qsv pro, MCP server, Cowork plugin, qsv-recipes, qsv-lookup-tables, DataPusher+).

When to pick which:

  • Inside a Python project where you already use pandas — csvkit might fit better.
  • For shell pipelines, CI gates, or large files — qsv wins decisively.
  • The two can coexist; many users use csvkit's csvstat then pipe results into a downstream qsv command.

qsv vs Miller (mlr)

Miller is C, fast, and shape-agnostic — it handles CSV, TSV, JSON, JSONL, DKVP, PPRINT, NIDX. qsv is CSV-specialized with deeper stats and validation.

Speed: comparable for streaming row ops. qsv pulls ahead on aggregations, joins, and stats due to multithreading.

Where Miller shines:

  • DKVP and nested-JSON inputs.
  • Compact DSL that does row transformations and filtering in one expression.
  • Long-standing maturity.

Where qsv shines:

  • 47-metric stats with guaranteed type inference.
  • JSON Schema validation at 780k rows/sec.
  • Polars-powered SQL and asof joins.
  • Integrated AI workflows.
  • MCP server / Cowork plugin / qsv pro ecosystem.

Both are excellent. Many shell power-users keep both installed and reach for whichever is faster for the task at hand.

qsv vs DuckDB CSV reader

DuckDB is an embedded analytical SQL database. Its read_csv table function is highly optimized.

Different jobs:

  • DuckDB excels at multi-CSV SQL analytics with a full query optimizer and OLAP execution engine.
  • qsv excels at the pre-DB cleaning, profiling, validation, and enrichment layer.

The recommended pattern: use qsv for cleaning + Parquet conversion, then DuckDB for analytics:

qsv stats --stats-jsonl raw.csv
qsv schema --polars raw.csv
qsv to parquet outdir/ raw.csv

duckdb -c "SELECT * FROM read_parquet('outdir/raw.parquet') WHERE ..."

qsv also integrates with DuckDB directly:

  • qsv sqlp runs Polars SQL; the closest spiritual cousin to duckdb -c "..." for CSVs.
  • qsv scoresql --duckdb uses DuckDB's planner to score a query before running it.
  • qsv describegpt with QSV_DUCKDB_PATH uses DuckDB for SQL-RAG.

See Integrations → DuckDB.

qsv vs pandas

pandas is the Python data-analysis workhorse. qsv complements pandas; it doesn't replace it.

  • pandas is in-memory, Python-native, and excels at ad-hoc analysis with charts and ML.
  • qsv is streaming (mostly), shell-native, and excels at fast profiling, validation, transformations on multi-GB files, and pre-DB cleaning.

For a 2.7M-row CSV, qsv stats runs in well under a second; pd.read_csv(...).describe() typically takes 10+ seconds. For inline charts or train_test_split, pandas / scikit-learn are obviously the right tools.

Use qsv from notebooks via subprocess:

import subprocess
subprocess.run(['qsv', 'stats', '--everything', '--stats-jsonl', 'data.csv'], check=True)
stats_df = pd.read_csv('data.stats.csv')

See Integrations → Python notebooks.

qsv vs Visidata

Visidata is a terminal UI for tabular data exploration — closer to qsv lens than to the rest of qsv. Both are excellent in their niche.

  • Visidata is interactive (TUI). Sift, filter, pivot, sort visually.
  • qsv lens is interactive too, via csvlens.
  • The rest of qsv is non-interactive — fast batch operations from the shell.

Use Visidata for exploratory analysis; use qsv for the rest of the pipeline.

qsv vs awk / sed for CSVs

awk and sed are general-purpose text tools. They don't understand CSV quoting — embedded commas, multi-line quoted fields, and escaped quotes will trip them up.

Use qsv for any CSV operation. Use awk / sed for plain-text logs and configuration files.

Choosing — the cheat sheet

You want to… Reach for
Profile / validate / clean a CSV qsv
Multi-file SQL analytics DuckDB + qsv-prepared Parquet
Notebook-driven exploratory ML pandas + qsv subprocess for heavy lifting
Shape-agnostic stream processing (JSON, DKVP, …) Miller
Interactive TUI exploration qsv lens or Visidata
Drag-and-drop GUI exploration qsv pro
AI-assisted analysis qsv MCP Server + Claude Cowork Plugin
xsv-compatible drop-in qsvlite

See also

Clone this wiki locally