A tool facilitating matching columns across tabular datasets. It also serves as an experiment suite for state-of-the-art schema matching methods.
-
Updated
May 15, 2026 - Python
A tool facilitating matching columns across tabular datasets. It also serves as an experiment suite for state-of-the-art schema matching methods.
Valentine scalable deployment for VLDB demo
Deterministic key and join discovery for structured datasets
Master thesis: Holistic Schema Matching at Scale
JCDL 2025 Paper "Multi-Disciplinary Dataset Discovery from Citation-Verified Literature Contexts" which matching research questions to cited datasets.
Your dataset discovery and curation buddy.
Code, data, and dataset browser for an LREC 2026 study of language dataset visibility in low-resource multilingual NLP, comparing catalogue counts with citation-traced datasets.
Search-first dataset discovery platform with DuckDB, FastAPI, background workers, and a reproducible local demo path.
Discover underexplored biomedical datasets through transparent, deterministic scoring. A scientific instrument for finding GEO, SRA, Zenodo, ENA, HCA, Expression Atlas, and Open Targets datasets that deserve a second look — local-first, BYOK, fully auditable.
Add a description, image, and links to the dataset-discovery topic page so that developers can more easily learn about it.
To associate your repository with the dataset-discovery topic, visit your repo's landing page and select "manage topics."