Skip to content

feat: pgvector embedding, extraction & shared knowledge search system#1137

Closed
reski-rukmantiyo wants to merge 93 commits intonextlevelbuilder:devfrom
reski-rukmantiyo:feat/pgvector-kg-extraction-branch
Closed

feat: pgvector embedding, extraction & shared knowledge search system#1137
reski-rukmantiyo wants to merge 93 commits intonextlevelbuilder:devfrom
reski-rukmantiyo:feat/pgvector-kg-extraction-branch

Conversation

@reski-rukmantiyo
Copy link
Copy Markdown
Contributor

Summary

Full pgvector embedding and extraction pipeline with shared knowledge search, embedding management UI, and worker system for WhatsApp raw message processing.

Changes by commit (chronological)

  • Raw message chunking & embedding workers (23834d6b) — chunk raw messages, embed via workers, store in raw_message_chunks table
  • Extraction retry mechanism (f20fc646) — retry failed extractions with status tracking and error logging in listen_raw_message store
  • JSON truncation recovery (d71cc003) — recover from truncated JSON in extraction, improve WhatsApp worker robustness
  • Embedding management CRUD + UI (fead736a) — full embedding CRUD API, management page in web UI
  • Embedding dimension 768 + pq.StringArray (9ec4b594) — update to 768 dims, improve response parsing, chunk ID array support
  • Schema version bump (f3b04fd4) — 768 dimension support, multiple response formats, schema v57
  • Extraction status filter + message stats (0b083196) — replace processed filter with extraction status, add statistics display
  • Raw message status display (645170af) — support extracted/failed states in UI
  • Dynamic polling & concurrent processing (678b4c97) — dynamic polling intervals, concurrent embedding/extraction, abandoned group recovery
  • Re-Embed for missing embeddings (d9a32eda) — re-embed functionality with UI integration
  • Tokopedia QR + deferred scopeClause (847c51b8) — defer FTS/vector scopeClause evaluation for dynamic query params
  • Shared knowledge search tool (522286ff) — new shared_knowledge_search tool, integrated into gateway agent framework
  • Date range extraction & chunk scoping (86098a6f) — date range extraction for shared knowledge, refactor chunk scoping, embedding eval tools
  • Day-based grouping & FTS OR logic (44c40f5a) — day-based message grouping for embeddings, FTS OR logic, scan all scopes in shared knowledge search

Key additions

Area Files
Workers whatsapp/embedding_worker.go, whatsapp/extract_worker.go
Store layer pg/raw_message_chunks.go, listen_raw_messages.go (PG + SQLite)
HTTP API http/embeddings.go, tools/shared_knowledge_search.go, tools/raw_message_search.go, tools/date_extract.go
KG extractor knowledgegraph/extractor.go, tests
Embeddings memory/embeddings.go, tests
Web UI pages/embeddings/ (new page), raw-messages/ updates, i18n (en/vi/zh)
Schema PG migration removed (consolidated), SQLite schema updated, version.go bump
Tests chunk_evaluator_test.go, embeddings_test.go, date_extract_test.go, extractor_test.go

Stats

  • 69 files changed, +5,438 / -345 lines
  • 14 commits

Test plan

  • Verify embedding worker processes raw messages into chunks with 768-dim vectors
  • Verify extraction retry recovers failed extractions
  • Verify Re-Embed button in UI regenerates missing embeddings
  • Verify shared_knowledge_search returns results across all scopes
  • Verify date range extraction parses natural language date queries
  • Verify FTS uses OR logic for multi-term queries
  • Verify SQLite schema matches PG schema for desktop edition
  • Verify i18n strings render in en/vi/zh
  • Run integration tests: go test -v -tags integration ./tests/integration/

🤖 Generated with Claude Code

reski-rukmantiyo and others added 30 commits April 14, 2026 07:55
…ovements

Squash-merge of 6 commits:
- WhatsApp group agent override configuration support
- Store methods for channel instances
- Improved WhatsApp group agent routing and recursive config coercion
- Block reply tracking restricted to intermediate tool calls
- WhatsApp group management and display in channel list
- Human-readable names for WhatsApp group overrides

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ommits squashed)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…xtraction pipeline (18 commits squashed)

Squashed from feat/agent-listen-only-whatsapp branch covering:
- WhatsApp listen-only mode for silent knowledge graph extraction
- Per-group require_mention and agent ID overrides
- listen_raw_messages storage and background extraction worker
- Raw message listing API and UI dashboard with detail dialog
- ListenBuffer refactoring for real-time raw message storage
- Group refresh functionality and UI sync status indicator
- Group entity and participation relations in extraction prompt

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…er-message

feat: WhatsApp media caption buffer, reconnect watchdog, KG event_time, Telegram reconnect
…king

fix: KG shared mode scoping, ClearAll, raw message ResetProcessed, and UI workspace sharing
…snt-works

fix: KG event_time not returned in entity queries + ambiguous user_id in recursive traversal
…ssion-error

fix: allow workspace path exemption for skill execution
Merge W3: WhatsApp media, KG event_time, Telegram reconnect, bugfixes
reski-rukmantiyo and others added 27 commits April 27, 2026 20:42
…invalidating stale sessions on tool timeouts
Resolve merge conflicts in version.go and schema.go:
- Keep both heartbeat FK migration (25→26) and MCP health checks (27→28)
- Bump SchemaVersion to 28 for SQLite

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…removewriter, and /writers commands and update file write permissions.
…es, and enable fail-open for file write permissions when no rules exist.
…ent dates in search, and improve knowledge graph tool output formatting.
…ated match logic, and increased result limits
…g, and add pq.StringArray support for chunk IDs
… group recovery for embedding and extraction workers
…nd vector search to support dynamic query parameters
…chunk storage scoping, and implement embedding evaluation tools.
… FTS to OR logic, and update shared knowledge search to scan all scopes
@reski-rukmantiyo reski-rukmantiyo deleted the feat/pgvector-kg-extraction-branch branch May 9, 2026 23:30
@reski-rukmantiyo reski-rukmantiyo restored the feat/pgvector-kg-extraction-branch branch May 9, 2026 23:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant