Skip to content

feat: soft token matching + Spider benchmark eval scaffold#1

Merged
jw-open merged 1 commit into
mainfrom
feature/soft-matching-spider-benchmark
Apr 9, 2026
Merged

feat: soft token matching + Spider benchmark eval scaffold#1
jw-open merged 1 commit into
mainfrom
feature/soft-matching-spider-benchmark

Conversation

@jw-open
Copy link
Copy Markdown
Owner

@jw-open jw-open commented Apr 9, 2026

  • Add graph2sql/matching.py: stemming-based soft match scorer Handles plurals (customers→customer), compound words, content + alias matching Label matches weighted 2x vs content/attribute matches Pure Python stdlib — no new dependencies

  • Update ranking.py to use soft_match_score instead of exact token matching Fixes the main weakness: "customer" now matches "customers", "customer_id" etc.

  • Add benchmarks/spider_eval.py: table recall eval script for Spider dataset
    Measures: mean recall@k, perfect recall fraction, zero recall fraction
    Usage: python benchmarks/spider_eval.py --spider-dir ./data/spider --k 3

  • Add tests/test_matching.py: 14 tests for matching module

ohwise_backend compatibility note:
graph2sql graph format ({"nodes":[...], "edges":[...]}) is identical to ohwise_backend PersonalizedPageRank.execute() input format — no changes needed to integrate graph2sql as a drop-in for the internal PPR service.

- Add graph2sql/matching.py: stemming-based soft match scorer
  Handles plurals (customers→customer), compound words, content + alias matching
  Label matches weighted 2x vs content/attribute matches
  Pure Python stdlib — no new dependencies

- Update ranking.py to use soft_match_score instead of exact token matching
  Fixes the main weakness: "customer" now matches "customers", "customer_id" etc.

- Add benchmarks/spider_eval.py: table recall eval script for Spider dataset
  Measures: mean recall@k, perfect recall fraction, zero recall fraction
  Usage: python benchmarks/spider_eval.py --spider-dir ./data/spider --k 3

- Add tests/test_matching.py: 14 tests for matching module

ohwise_backend compatibility note:
  graph2sql graph format ({"nodes":[...], "edges":[...]}) is identical to
  ohwise_backend PersonalizedPageRank.execute() input format — no changes needed
  to integrate graph2sql as a drop-in for the internal PPR service.
@jw-open jw-open merged commit e5b3c42 into main Apr 9, 2026
4 checks passed
@jw-open jw-open deleted the feature/soft-matching-spider-benchmark branch April 9, 2026 01:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant