Skip to content

[schemas] Full-text search — keyword search RPC for thoughts#53

Open
alanshurafa wants to merge 2 commits intoNateBJones-Projects:mainfrom
alanshurafa:contrib/alanshurafa/full-text-search
Open

[schemas] Full-text search — keyword search RPC for thoughts#53
alanshurafa wants to merge 2 commits intoNateBJones-Projects:mainfrom
alanshurafa:contrib/alanshurafa/full-text-search

Conversation

@alanshurafa
Copy link
Contributor

Summary

  • Adds search_thoughts_text RPC using PostgreSQL's built-in full-text search
  • Creates GIN index on thoughts content for fast keyword queries
  • Supports natural language queries, phrase search, and boolean operators via websearch_to_tsquery
  • Complements the existing match_thoughts semantic vector search

Why

Semantic search finds related concepts, but sometimes you need exact keyword matches — person names, project codes, dates, technical terms. This gives OB1 users both discovery (semantic) and retrieval (keyword) search modes.

Scale

Tested against 75,000+ thoughts in production. GIN index keeps queries fast.

Test plan

  • Keyword search returns relevant results ranked by relevance
  • Phrase search works with quoted strings
  • Boolean operators (OR, AND, NOT) work
  • GIN index prevents sequential scans on large tables
  • RPC callable from Supabase client and Edge Functions

🤖 Generated with Claude Code

@justfinethanku
Copy link
Collaborator

Saw your Discord post — impressive work across all of these. I'm going to personally review each one. Not today, but they're on my list.

Copy link
Collaborator

@justfinethanku justfinethanku left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this contribution! I've completed a thorough review of PR #53. Here's my assessment:

What's Good

Full-Text Search (schemas/):

  • Excellent documentation with clear use cases
  • Clean SQL implementation using PostgreSQL's native full-text search
  • Proper GIN indexing for performance
  • Good complementary capability to existing semantic search
  • Well-structured metadata.json with all required fields
  • Solid troubleshooting section
  • Production-tested at scale (75K+ thoughts)

Content Fingerprint Dedup (primitives/):

  • Comprehensive README with excellent technical depth
  • Strong "Extensions That Use This" section (required for primitives)
  • Proper cross-reference to email-history-import recipe
  • Clear step-by-step guide with batching strategies
  • Good troubleshooting coverage

Issues to Address

Critical

1. PR Title Mismatch
Your PR title is "[schemas] Full-text search" but you're modifying TWO categories:

  • primitives/content-fingerprint-dedup/
  • schemas/full-text-search/

Resolution needed: Update the PR title to reflect both contributions. Suggested format:

[schemas][primitives] Full-text search RPC + content fingerprint dedup

2. PR Description Incomplete
The PR description only describes the full-text search feature. It doesn't mention that you're also adding the content fingerprint dedup primitive.

Resolution needed: Update the PR description to include:

  • A summary of both contributions
  • Why they're being submitted together (if there's a relationship)
  • OR split into two separate PRs if they're independent

Moderate

3. Missing Temporal Metadata
The content-fingerprint-dedup/metadata.json is missing optional but recommended fields:

  • created (should be "2026-03-16" based on the full-text-search dates)
  • updated (should be "2026-03-16")

Resolution needed: Add these fields to match the pattern in schemas/full-text-search/metadata.json.

Questions for Clarification

4. Primitive Curation Process
Per CONTRIBUTING.md, primitives are CURATED and must be referenced by 2+ extensions. The README shows one reference (email-history-import).

Question: What is the second extension/recipe that uses this primitive? Or is this being submitted with maintainer pre-approval?

SQL Safety Check

✓ No DROP TABLE/DATABASE
✓ No TRUNCATE
✓ No unqualified DELETE FROM
✓ ALTER TABLE only adds columns (allowed per guard rails)
✓ No credentials or secrets

Automated Review Criteria

✓ Folder structure correct for categories
✓ Required files present (README.md + metadata.json for both)
✓ metadata.json valid and complete
✓ No binary blobs
✓ README sections complete
✓ Cross-references resolve (email-history-import exists)

Verdict: Minor fixes needed

This is quality work with solid technical implementation and good documentation. The issues are primarily administrative (title/description clarity) rather than technical problems. Once you address the title mismatch and PR description, this should be ready to merge.

Action items:

  1. Update PR title to reflect both contributions
  2. Update PR description to document both contributions
  3. Add created and updated dates to content-fingerprint-dedup metadata.json
  4. Clarify the second extension reference for the primitive (or confirm maintainer pre-approval)

Great contribution overall! The full-text search implementation is particularly clean and will be valuable for users who need exact keyword matching alongside semantic search.

Add search_thoughts_text RPC with GIN index for keyword-based
thought retrieval. Complements existing semantic vector search.
Includes migration.sql for CI compliance.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
CI requires Prerequisites section for README completeness check.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants