Skip to content

Releases: delftdata/valentine

valentine-1.0.0

14 May 14:51
f0f7389

Choose a tag to compare

What's Changed

  • Rename ComaPy to Coma, remove Java dependency by @kPsarakis in #92
  • v1.0.0 API redesign: unified match API, ColumnPair, immutable results, metrics overhaul by @kPsarakis in #93
  • Add take_top_n_per_source method to MatcherResults by @diogohal in #94
  • Add Zensical-based documentation site by @kPsarakis in #95
  • Performance: 27× speedup across matchers + Coma accuracy improvements by @kPsarakis in #96
  • harden the nltk download data function by @kPsarakis in #98
  • Add embeddings support, modify Jaccard, enrich one-to-one filtering by @chrisk21 in #97
  • Release v1.0.0 by @kPsarakis in #99

New Contributors

Full Changelog: https://delftdata.github.io/valentine/changelog/#v100-2026-05-14

valentine-0.5.0

05 Apr 09:03
aa9d9c8

Choose a tag to compare

Release notes for v0.5.0:

New: Pure Python COMA algorithm (ComaPy) — #84

  • Adds ComaPy, a pure Python implementation of the COMA 3.0 schema matching algorithm, eliminating the Java 6 runtime dependency (40MB JAR + subprocess IPC)
  • Faithfully reproduces COMA's matching pipeline: trigram name/path similarity, TF-IDF instance matching with global IDF, and structural matchers — all combined via max-matching Dice set aggregation
  • Configurable via use_schema, use_instances, delta, threshold, and max_n parameters
  • The Java Coma class is now deprecated and will be removed in v1.0.0, when ComaPy will be renamed to Coma

Improved: Cupid algorithm aligned with the original paper — #81

  • Fixed structural matching, leaf similarity range, name similarity formula, and wsim recomputation
  • Added integration tests based on examples from the Cupid paper (Figures 7 & 8)

Improved: Similarity Flooding aligned with the original paper — #82

  • Implemented the paper's prefix/suffix string matcher replacing Levenshtein
  • Fixed Formula B and C fixpoint iteration bootstrapping
  • Fixed hash/equality contracts for Node and NodePair

Improved: Distribution-Based algorithm aligned with the original paper — #83

  • Added missing triangle inequality constraints to correlation clustering
  • Fixed intersection EMD computation to use per-source filtering
  • Type-aware sorting for global ranks (pure Python, faster, cross-platform)
  • Fixed quantile histogram bucket ordering

Internal: Ruff linter and formatter — #80

  • Migrated the entire codebase to Ruff for linting and formatting

valentine-0.4.1

14 Oct 11:47
64a52bc

Choose a tag to compare

What's Changed

  • Support Python 3.14

valentine-0.3.0

28 Nov 13:06

Choose a tag to compare

  • Remove pyemd in favor of POT
  • Add support for numpy 2.0+

valentine-0.2.1

22 Aug 10:26

Choose a tag to compare

What's Changed

valentine-0.2.0

14 Feb 12:33
dd15f95

Choose a tag to compare

new metrics API

valentine-0.1.9

09 Nov 11:18

Choose a tag to compare

Address the ambiguity with Coma errors as discussed in #58

valentine-0.1.8

13 Oct 13:18

Choose a tag to compare

  • New string similarity functions for the JaccardLeven method.
  • Batching API support
  • Python 3.12 support

valentine-0.1.7

18 May 09:10

Choose a tag to compare

Code cleanup and removal of unused dependencies

valentine-0.1.6

11 Apr 10:06

Choose a tag to compare

Fix issue #53 for the DistributionBased method