Releases: delftdata/valentine
Releases · delftdata/valentine
valentine-1.0.0
What's Changed
- Rename ComaPy to Coma, remove Java dependency by @kPsarakis in #92
- v1.0.0 API redesign: unified match API, ColumnPair, immutable results, metrics overhaul by @kPsarakis in #93
- Add take_top_n_per_source method to MatcherResults by @diogohal in #94
- Add Zensical-based documentation site by @kPsarakis in #95
- Performance: 27× speedup across matchers + Coma accuracy improvements by @kPsarakis in #96
- harden the nltk download data function by @kPsarakis in #98
- Add embeddings support, modify Jaccard, enrich one-to-one filtering by @chrisk21 in #97
- Release v1.0.0 by @kPsarakis in #99
New Contributors
Full Changelog: https://delftdata.github.io/valentine/changelog/#v100-2026-05-14
valentine-0.5.0
Release notes for v0.5.0:
New: Pure Python COMA algorithm (ComaPy) — #84
- Adds
ComaPy, a pure Python implementation of the COMA 3.0 schema matching algorithm, eliminating the Java 6 runtime dependency (40MB JAR + subprocess IPC) - Faithfully reproduces COMA's matching pipeline: trigram name/path similarity, TF-IDF instance matching with global IDF, and structural matchers — all combined via max-matching Dice set aggregation
- Configurable via
use_schema,use_instances,delta,threshold, andmax_nparameters - The Java
Comaclass is now deprecated and will be removed in v1.0.0, whenComaPywill be renamed toComa
Improved: Cupid algorithm aligned with the original paper — #81
- Fixed structural matching, leaf similarity range, name similarity formula, and wsim recomputation
- Added integration tests based on examples from the Cupid paper (Figures 7 & 8)
Improved: Similarity Flooding aligned with the original paper — #82
- Implemented the paper's prefix/suffix string matcher replacing Levenshtein
- Fixed Formula B and C fixpoint iteration bootstrapping
- Fixed hash/equality contracts for
NodeandNodePair
Improved: Distribution-Based algorithm aligned with the original paper — #83
- Added missing triangle inequality constraints to correlation clustering
- Fixed intersection EMD computation to use per-source filtering
- Type-aware sorting for global ranks (pure Python, faster, cross-platform)
- Fixed quantile histogram bucket ordering
Internal: Ruff linter and formatter — #80
- Migrated the entire codebase to Ruff for linting and formatting
valentine-0.4.1
What's Changed
- Support Python 3.14
valentine-0.3.0
- Remove pyemd in favor of POT
- Add support for numpy 2.0+
valentine-0.2.1
What's Changed
- Test case improvement by @ThanosTsiamis in #71
- Upgrade to nltk 3.9.1 to address CVE-2024-39705 by @aecio in #75
valentine-0.2.0
new metrics API
valentine-0.1.9
Address the ambiguity with Coma errors as discussed in #58
valentine-0.1.8
- New string similarity functions for the
JaccardLevenmethod. - Batching API support
- Python 3.12 support
valentine-0.1.7
Code cleanup and removal of unused dependencies
valentine-0.1.6
Fix issue #53 for the DistributionBased method