Releases · delftdata/valentine · GitHub

14 May 14:51

kPsarakis

valentine-1.0.0 Latest

Latest

What's Changed

Rename ComaPy to Coma, remove Java dependency by @kPsarakis in #92
v1.0.0 API redesign: unified match API, ColumnPair, immutable results, metrics overhaul by @kPsarakis in #93
Add take_top_n_per_source method to MatcherResults by @diogohal in #94
Add Zensical-based documentation site by @kPsarakis in #95
Performance: 27× speedup across matchers + Coma accuracy improvements by @kPsarakis in #96
harden the nltk download data function by @kPsarakis in #98
Add embeddings support, modify Jaccard, enrich one-to-one filtering by @chrisk21 in #97
Release v1.0.0 by @kPsarakis in #99

New Contributors

@diogohal made their first contribution in #94

Full Changelog: https://delftdata.github.io/valentine/changelog/#v100-2026-05-14

Contributors

chrisk21, kPsarakis, and diogohal

Assets 2

05 Apr 09:03

kPsarakis

valentine-0.5.0

Release notes for v0.5.0:

New: Pure Python COMA algorithm (`ComaPy`) — #84

Adds ComaPy, a pure Python implementation of the COMA 3.0 schema matching algorithm, eliminating the Java 6 runtime dependency (40MB JAR + subprocess IPC)
Faithfully reproduces COMA's matching pipeline: trigram name/path similarity, TF-IDF instance matching with global IDF, and structural matchers — all combined via max-matching Dice set aggregation
Configurable via use_schema, use_instances, delta, threshold, and max_n parameters
The Java Coma class is now deprecated and will be removed in v1.0.0, when ComaPy will be renamed to Coma

Improved: Cupid algorithm aligned with the original paper — #81

Fixed structural matching, leaf similarity range, name similarity formula, and wsim recomputation
Added integration tests based on examples from the Cupid paper (Figures 7 & 8)

Improved: Similarity Flooding aligned with the original paper — #82

Implemented the paper's prefix/suffix string matcher replacing Levenshtein
Fixed Formula B and C fixpoint iteration bootstrapping
Fixed hash/equality contracts for Node and NodePair

Improved: Distribution-Based algorithm aligned with the original paper — #83

Added missing triangle inequality constraints to correlation clustering
Fixed intersection EMD computation to use per-source filtering
Type-aware sorting for global ranks (pure Python, faster, cross-platform)
Fixed quantile histogram bucket ordering

Internal: Ruff linter and formatter — #80

Migrated the entire codebase to Ruff for linting and formatting

Assets 2

14 Oct 11:47

kPsarakis

valentine-0.4.1

What's Changed

Support Python 3.14

Assets 2

28 Nov 13:06

kPsarakis

valentine-0.3.0

Remove pyemd in favor of POT
Add support for numpy 2.0+

Assets 2

22 Aug 10:26

kPsarakis

valentine-0.2.1

What's Changed

Test case improvement by @ThanosTsiamis in #71
Upgrade to nltk 3.9.1 to address CVE-2024-39705 by @aecio in #75

Contributors

aecio and ThanosTsiamis

Assets 2

14 Feb 12:33

kPsarakis

valentine-0.2.0

new metrics API

Assets 2

09 Nov 11:18

kPsarakis

valentine-0.1.9

Address the ambiguity with Coma errors as discussed in #58

Assets 2

13 Oct 13:18

kPsarakis

valentine-0.1.8

New string similarity functions for the JaccardLeven method.
Batching API support
Python 3.12 support

Assets 2

18 May 09:10

kPsarakis

valentine-0.1.7

Code cleanup and removal of unused dependencies

Assets 2

11 Apr 10:06

kPsarakis

valentine-0.1.6

Fix issue #53 for the DistributionBased method

Assets 2