spike: evaluate SQLite vector storage for desktop mode by Monica-CodingWorld · Pull Request #310 · Abhash-Chakraborty/Find

Monica-CodingWorld · 2026-06-14T15:20:50Z

Summary

Adds a SQLite + sqlite-vec proof of concept to evaluate whether SQLite can support Find's metadata storage and vector-search requirements for desktop mode without changing the current PostgreSQL + pgvector runtime.

This implementation creates a small evaluation layer that demonstrates metadata storage, vector insertion, similarity search, and gallery-style queries using sqlite-vec.

Fixes #256

Type of change

What changed

Added sqlite_vec_poc.py to evaluate SQLite + sqlite-vec functionality.
Implemented schema creation for metadata and vector storage.
Added media insertion with 768-dimensional embeddings.
Implemented vector similarity search using sqlite-vec.
Added gallery query support for metadata retrieval.
Added test coverage validating schema creation, vector insertion, similarity search, and gallery query behavior.

Screenshots / recordings (for UI changes)

Attach before/after screenshots or a short video.

How to test

Install sqlite-vec

pip install sqlite-vec

Run the proof-of-concept tests

pytest tests/test_sqlite_vec_poc.py -v

Manual validation performed:

Created SQLite database and schema successfully.
Inserted 768-dimensional vectors.
Executed similarity search queries.
Verified gallery metadata retrieval.
Confirmed nearest-neighbor ordering from sqlite-vec search results.

Checklist

I linked the related issue
I ran required checks from CONTRIBUTING.md
I updated docs/env notes if needed
My PR is scoped to a single issue
I followed commit message conventions
I am not committing secrets or local artifacts

GSSoC'26 checklist

I requested issue assignment before starting
I have meaningful commits (no spam commits)
I am ready to explain my implementation in review comments

Summary by CodeRabbit

New Features
- Added a SQLite-based proof-of-concept for storing media embeddings and performing nearest-neighbor similarity search, returning matching media results (including distance) and supporting gallery retrieval.
Tests
- Improved test handling for the optional sqlite-vec dependency, including skip behavior when unavailable and a clear error assertion when it’s missing.
Documentation
- Updated the ADR with a new section describing the SQLite vector spike results and current limitations/follow-ups.

github-actions · 2026-06-14T15:21:01Z

PR Context Summary

Linked issue(s): spike: validate SQLite metadata and vector storage for desktop mode #256
Referenced but not closing: none
PR author trusted by GitHub: no
Dependabot PR: no
PR assignee synced from linked issue: no change
Macroscope review status: Already triggered once for this PR. Use the workflow dispatch to manually rerun.

Walkthrough

Adds a new proof-of-concept module sqlite_vec_poc.py implementing SQLite-backed vector storage using the sqlite-vec extension. The module provides a connection factory with optional dependency handling, schema creation (media table + vec0 virtual table), media/vector insertion helpers, two similarity search functions, and a high-level wrapper class. A companion test file with dependency detection validates all behaviors through four pytest tests. The desktop runtime ADR is updated with spike result documentation.

Changes

SQLite Vector Storage POC

Layer / File(s)	Summary
Connection factory and schema initialization `backend/src/find_api/core/sqlite_vec_poc.py`	`create_connection` enables SQLite extension loading and initializes `sqlite_vec`, raising a descriptive `RuntimeError` if the dependency is missing. `create_schema` creates a `media` table and a `media_vectors` vec0 virtual table parameterized by `embedding_dim`.
Media and vector insertion helpers `backend/src/find_api/core/sqlite_vec_poc.py`	`insert_vector` packs a float list into a binary blob and inserts into `media_vectors`; `insert_media` inserts a row into `media` with status defaulting to "indexed"; `count_vectors` returns a scalar row count from `media_vectors`.
Similarity search operations `backend/src/find_api/core/sqlite_vec_poc.py`	`search_vectors` runs a `WHERE embedding MATCH` query on `media_vectors` returning `(media_id, distance)`; `search_media` joins `media_vectors` to `media` and returns `(id, filename, distance)` ordered by vector distance.
High-level wrapper class API `backend/src/find_api/core/sqlite_vec_poc.py`	`SQLiteVecPOC` class wraps module functions: stores a connection, exposes `create_schema`, combines `insert_media` calling both media and vector insertion, maps search results to dictionaries, and provides `gallery_query` to fetch all media ordered by id.
Optional dependency detection and error handling `backend/tests/test_sqlite_vec_poc.py`	Dependency detection via `importlib.util.find_spec`. Pytest fixture `sqlite_vec_available` skips dependent tests when `sqlite-vec` is not installed. Test validates that `SQLiteVecPOC` raises `RuntimeError` matching "sqlite-vec is required" when the dependency is missing.
Functional tests `backend/tests/test_sqlite_vec_poc.py`	Four tests cover schema creation (DB file exists), 768-dim vector insertion and gallery retrieval, two-row similarity search ordering with top-match id validation, and exact gallery query shape including `status: "indexed"`.
ADR spike result documentation `docs/plans/not-started/desktop-runtime-adr.md`	ADR metadata updated to 2026-06-19. New section documents PoC code locations, local run instructions, test skip behavior, limitations (no Docker replacement, no migration coverage, incomplete validation of hybrid search/filters/clustering/queue), and guidance for implementation after benchmarking and query abstraction design.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐇 Hopping through bytes of float and blob,
A vec0 table — what a quest!
I pack my embeddings, small and neat,
And search for matches, quite the feat.
When sqlite-vec is absent, loud I cry,
But desktop SQLite lets Postgres say goodbye! 🌟

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main change: a proof-of-concept spike to evaluate SQLite vector storage for desktop mode, which directly reflects the PR's core objective.
Description check	✅ Passed	The description follows the template with summary, issue link, change type, detailed changes, test instructions, and completed checklist items. All required sections are present and substantively filled.
Linked Issues check	✅ Passed	The PR fulfills issue `#256` acceptance criteria by providing schema creation in SQLite [`#256`], demonstrating vector insert/search for 768-dimensional embeddings [`#256`], validating gallery/search query patterns [`#256`], documenting limitations in the ADR [`#256`], and preserving PostgreSQL defaults [`#256`].
Out of Scope Changes check	✅ Passed	All changes are scoped to the SQLite + sqlite-vec proof-of-concept: new POC module, comprehensive tests, and documentation updates to the desktop runtime ADR. No unrelated modifications detected.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

macroscopeapp · 2026-06-14T15:22:00Z

Approvability

Verdict: Needs human review

Unable to check for correctness in 38d0d26. This PR introduces a new SQLite vector storage proof-of-concept, which constitutes new feature capability. An unresolved review comment flags a potential SQL injection concern in the schema creation function where embedding_dim is interpolated into SQL without validation.

No code changes detected at 66b4b42. Prior analysis still applies.

^{You can customize Macroscope's approvability policy. Learn more.}

coderabbitai

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@backend/src/find_api/core/sqlite_vec_poc.py`:
- Around line 1-129: The entire file sqlite_vec_poc.py has formatting issues
that violate the project's Ruff code style standards. Run the Ruff formatter on
this file using the command `uv run ruff format
backend/src/find_api/core/sqlite_vec_poc.py` to automatically fix all formatting
violations, then commit the formatted result to resolve the CI blocking issue.
- Around line 18-33: The create_schema function interpolates embedding_dim
directly into SQL using an f-string without validation. Since Python type hints
are not enforced at runtime, invalid input could produce malformed DDL. Validate
that embedding_dim is a positive integer before the SQL statement construction,
and raise an appropriate exception (such as ValueError) if the value is invalid,
negative, or not an integer. This ensures the SQL statement is always
well-formed and prevents potential SQL injection or schema corruption issues.

In `@backend/tests/test_sqlite_vec_poc.py`:
- Around line 1-4: The test file imports SQLiteVecPOC class and EMBEDDING_DIM
constant from the implementation, but these do not exist in
backend/src/find_api/core/sqlite_vec_poc.py which currently only provides
function-based helpers. Either add the SQLiteVecPOC class wrapper and
EMBEDDING_DIM constant to the implementation file to match the test's
class-based API expectations (supporting methods like gallery_query and search),
or rewrite all test code throughout the file (lines 10-75) to use the existing
function-based helpers instead of class methods. Choose one approach and ensure
the imports at the top and all test method calls throughout the file are
consistent with whichever API design you select.
- Around line 1-81: The test file contains formatting drift that is causing Ruff
checks to fail. Run the Ruff formatter to automatically fix all formatting
issues in the test file (which contains the test functions test_schema_creation,
test_insert_768_dimension_vector, test_similarity_search, and
test_gallery_query_shape). After formatting, re-run the full check suite to
verify that the formatting changes pass all Ruff checks and tests.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a19c0025-e2cc-4aa0-ba33-83e87f73e838

📥 Commits

Reviewing files that changed from the base of the PR and between 858202e and 81ae551.

📒 Files selected for processing (2)

backend/src/find_api/core/sqlite_vec_poc.py
backend/tests/test_sqlite_vec_poc.py

coderabbitai · 2026-06-14T15:25:13Z

+def create_schema(conn, embedding_dim: int):
+    conn.execute("""
+        CREATE TABLE IF NOT EXISTS media (
+            id INTEGER PRIMARY KEY,
+            filename TEXT NOT NULL,
+            status TEXT NOT NULL
+        )
+    """)
+
+    conn.execute(f"""
+        CREATE VIRTUAL TABLE IF NOT EXISTS media_vectors
+        USING vec0(
+            media_id INTEGER PRIMARY KEY,
+            embedding FLOAT[{embedding_dim}]
+        )
+    """)


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate embedding_dim before interpolating it into schema SQL.

embedding_dim is inserted directly into SQL text. Because Python hints are not enforced, malformed input can generate invalid DDL (or worse, altered SQL text). Validate/cast to a positive integer before building the statement.

Suggested fix

-def create_schema(conn, embedding_dim: int): +def create_schema(conn, embedding_dim: int): + if not isinstance(embedding_dim, int) or embedding_dim <= 0: + raise ValueError("embedding_dim must be a positive integer") + conn.execute(""" CREATE TABLE IF NOT EXISTS media ( id INTEGER PRIMARY KEY, filename TEXT NOT NULL, status TEXT NOT NULL ) """)

As per coding guidelines, "Keep EMBEDDING_DIM aligned with the configured CLIP/SigLIP model and pgvector columns."

🧰 Tools

🪛 OpenGrep (1.22.0)

[ERROR] 27-33: SQL query built via f-string passed to execute()/executemany(). Use parameterized queries with placeholders instead.

(coderabbit.sql-injection.python-fstring-execute)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/src/find_api/core/sqlite_vec_poc.py` around lines 18 - 33, The create_schema function interpolates embedding_dim directly into SQL using an f-string without validation. Since Python type hints are not enforced at runtime, invalid input could produce malformed DDL. Validate that embedding_dim is a positive integer before the SQL statement construction, and raise an appropriate exception (such as ValueError) if the value is invalid, negative, or not an integer. This ensures the SQL statement is always well-formed and prevents potential SQL injection or schema corruption issues.

Sources: Coding guidelines, Linters/SAST tools

Monica-CodingWorld · 2026-06-14T16:01:59Z

Hi @Abhash-Chakraborty ! I updated the branch to address the CodeRabbit feedback by adding the SQLiteVecPOC wrapper and EMBEDDING_DIM, and I also ran Ruff formatting locally on the files changed in this PR.

My branch is currently clean and only contains:

backend/src/find_api/core/sqlite_vec_poc.py
backend/tests/test_sqlite_vec_poc.py

However, the CI is still failing. I noticed that ruff format --check . reports formatting issues in a few unrelated files already present in the branch history, and I'm not sure whether the workflow is failing because of those or because of something specific to my changes.

Could you please take a quick look and let me know if there's anything else I should update in this PR? Thanks!

github-actions · 2026-06-19T19:04:48Z

@macroscope-app review

Please review this PR against its linked issue, local-first privacy rules, and the current Find repo instructions.
Linked issue(s): #256.
Trigger source: label-gated review (ready-to-merge).

Abhash-Chakraborty

Reviewed against #256. I fixed the remaining acceptance gaps by documenting the sqlite-vec spike result/limitations in the desktop runtime ADR and making the POC dependency optional-safe so normal backend installs do not fail when sqlite-vec is not installed.\n\nTested locally:\n- uv run ruff check src/find_api/core/sqlite_vec_poc.py tests/test_sqlite_vec_poc.py\n- uv run pytest tests/test_sqlite_vec_poc.py -q\n\nLocal pytest result is expected on this machine: 1 passed, 4 skipped because sqlite-vec is optional and not installed. With sqlite-vec installed, the vector insert/search tests run. This keeps the PR scoped as a desktop/runtime spike and does not switch production away from PostgreSQL + pgvector.

coderabbitai

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@backend/src/find_api/core/sqlite_vec_poc.py`:
- Line 67: The insert_media() method creates a race condition where orphaned
media records can be created if insert_vector() fails after insert_media()
commits. Remove the independent conn.commit() calls from both helper functions
(the one at line 67 in the vector insertion helper and the one at line 83 in the
media insertion helper). Then wrap both the insert_media() and insert_vector()
calls within the main insert_media() method in a single transaction context that
commits once after both operations complete successfully, ensuring atomicity.
- Around line 52-57: Add explicit embedding dimension validation to enforce the
EMBEDDING_DIM = 768 contract in three locations: before the struct.pack() call
in insert_vector, and before any embedding processing in search_vectors and
search_media. For each function, verify that the length of the embedding
parameter matches EMBEDDING_DIM and raise a clear ValueError if it does not,
preventing silent dimension mismatches from causing cryptic sqlite-vec errors
downstream.
- Around line 11-25: In the create_connection function, close the database
connection before raising the RuntimeError when ModuleNotFoundError is caught to
prevent resource leaks. Additionally, immediately after calling
sqlite_vec.load(conn), disable extension loading by calling
conn.enable_load_extension(False) to minimize the attack surface and follow the
principle of least privilege.

In `@docs/plans/not-started/desktop-runtime-adr.md`:
- Around line 88-94: The spike running instructions use plain pip install
instead of the project's uv package manager, which can cause the package to be
installed outside the uv environment and lead to false skips. Replace the line
pip install sqlite-vec with uv pip install sqlite-vec to ensure the package is
installed in the proper uv environment and matches the project's package
management setup.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 93d74ea2-7df8-46e8-8bc7-6256e5394d45

📥 Commits

Reviewing files that changed from the base of the PR and between 81ae551 and 0d5d37f.

📒 Files selected for processing (3)

backend/src/find_api/core/sqlite_vec_poc.py
backend/tests/test_sqlite_vec_poc.py
docs/plans/not-started/desktop-runtime-adr.md

coderabbitai · 2026-06-20T05:10:19Z

+def create_connection(db_path=":memory:"):
+    conn = sqlite3.connect(str(db_path))
+    conn.enable_load_extension(True)
+
+    try:
+        import sqlite_vec
+    except ModuleNotFoundError as exc:
+        raise RuntimeError(
+            "sqlite-vec is required for this desktop-runtime proof of concept. "
+            "Install it manually with `pip install sqlite-vec` before running "
+            "the sqlite_vec_poc tests."
+        ) from exc
+
+    sqlite_vec.load(conn)
+    return conn


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, check if the file exists and view its content cat -n backend/src/find_api/core/sqlite_vec_poc.py

Repository: Abhash-Chakraborty/Find

Length of output: 5602

🏁 Script executed:

# Search for tests related to this POC fd -t f "test.*sqlite" . --type f fd -t f ".*test.*" backend/ --type f | grep -i sqlite

Repository: Abhash-Chakraborty/Find

Length of output: 143

🏁 Script executed:

# Check usage of create_connection across the codebase rg "create_connection" --type py -B 2 -A 2

Repository: Abhash-Chakraborty/Find

Length of output: 774

🏁 Script executed:

# Check the test file to see if create_connection is tested cat -n backend/tests/test_sqlite_vec_poc.py

Repository: Abhash-Chakraborty/Find

Length of output: 2909

🏁 Script executed:

# Verify if sqlite_vec is actually a dependency by checking requirements/setup fd -t f "requirements" backend/ --type f fd -t f "setup.py" . --type f fd -t f "pyproject.toml" . --type f

Repository: Abhash-Chakraborty/Find

Length of output: 92

🏁 Script executed:

# Check if there are any other places where conn.enable_load_extension is used rg "enable_load_extension" --type py -B 2 -A 2

Repository: Abhash-Chakraborty/Find

Length of output: 418

🏁 Script executed:

# Check if there's any cleanup logic in SQLiteVecPOC class or if it handles connection closure rg "close\(\)" backend/src/find_api/core/sqlite_vec_poc.py -B 2 -A 2 rg "def __del__" backend/src/find_api/core/sqlite_vec_poc.py -B 2 -A 5

Repository: Abhash-Chakraborty/Find

Length of output: 49

🏁 Script executed:

# Check pyproject.toml to see if sqlite_vec is a dependency cat -n backend/pyproject.toml | head -80

Repository: Abhash-Chakraborty/Find

Length of output: 2357

🏁 Script executed:

# Verify best practices for sqlite3 connection error handling web_search python sqlite3 connection cleanup best practices error handling

Repository: Abhash-Chakraborty/Find

Length of output: 206

🌐 Web query:

sqlite3 python connection cleanup best practices exception handling

💡 Result:

When managing SQLite connections in Python, it is critical to distinguish between transaction management and resource cleanup. The built-in context manager for sqlite3 connections handles only transactions, not the closing of the database connection itself [1][2][3][4]. Best Practices for Cleanup and Exception Handling: 1. Use contextlib.closing for Automatic Cleanup: Since the standard sqlite3 connection context manager does not call.close, use contextlib.closing to ensure the connection is closed automatically when the block exits, regardless of whether an exception occurs [5][6][7][8]. import sqlite3 from contextlib import closing with closing(sqlite3.connect("database.db")) as conn: with conn: # Handles transaction commit/rollback conn.execute("INSERT INTO table_name VALUES (?)", (data,)) 2. Manual Cleanup with try/finally: If you do not use contextlib.closing, you must explicitly call.close in a finally block to guarantee the connection is released, even if an error occurs [9]. conn = sqlite3.connect("database.db") try: with conn: conn.execute("INSERT INTO table_name VALUES (?)", (data,)) except sqlite3.Error as e: print(f"Database error: {e}") finally: conn.close 3. Understanding Transaction Management: The context manager used directly on a connection object (e.g., with conn:) automatically commits if the block finishes successfully and rolls back if an exception is raised [1][2][4]. It is intended for managing transaction boundaries, not for closing the connection [7]. 4. Key Considerations: - Do not rely on garbage collection: Explicitly closing connections prevents resource leaks and file locking issues [9][8]. - Catch specific exceptions: Use sqlite3.Error or its subclasses (e.g., sqlite3.OperationalError, sqlite3.IntegrityError) to handle database-related issues specifically [1][2][9]. - Keep connections scoped: Open the connection as late as possible and close it as soon as you are done to minimize the time the database file is locked [9][10].

Citations:

1: https://docs.python.org/release/3.10.19/library/sqlite3.html

2: https://docs.python.org/3.9/library/sqlite3.html

3: https://bugs.python.org/issue34580

4: https://docs.python.org/release/3.11.1/library/sqlite3.html

5: https://stackoverflow.com/questions/9561832/what-if-i-dont-close-the-database-connection

6: https://discuss.python.org/t/implicitly-close-sqlite3-connections-with-context-managers/33320

7: https://stackoverflow.com/questions/8630423/automate-opening-and-closing-the-database-connection

8: https://alexwlchan.net/notes/2024/sqlite3-context-manager-doesnt-close-connections/

9: https://zetcode.com/python/sqlite3-connection-close/

10: https://zetcode.com/python/sqlite3-connect/

🏁 Script executed:

# Check if sqlite_vec.load actually needs the extension to remain enabled # Look for any documentation or pattern in the codebase about extension loading rg "sqlite_vec" --type py -B 3 -A 3

Repository: Abhash-Chakraborty/Find

Length of output: 4569

🏁 Script executed:

# Check if there are any other POC files or similar patterns in the codebase fd -t f "poc\|proof" --type f

Repository: Abhash-Chakraborty/Find

Length of output: 49

🏁 Script executed:

# Check if sqlite_vec.load needs the extension to stay enabled after loading # Look for any documentation or check how sqlite-vec typically works web_search sqlite-vec python extension loading best practices enable_load_extension

Repository: Abhash-Chakraborty/Find

Length of output: 167

🏁 Script executed:

# Verify the actual impact of the resource leak by examining test more carefully # and checking if there are any __del__ or context manager implementations rg "__enter__|__exit__|__del__" backend/src/find_api/core/sqlite_vec_poc.py -B 2 -A 5

Repository: Abhash-Chakraborty/Find

Length of output: 49

Close connection on missing sqlite_vec import and disable extension loading after load.

On ModuleNotFoundError, the opened DB connection is leaked (line 12). Additionally, extension loading should be disabled immediately after sqlite_vec.load(conn) to minimize attack surface.

Suggested patch

def create_connection(db_path=":memory:"): conn = sqlite3.connect(str(db_path)) conn.enable_load_extension(True) try: import sqlite_vec except ModuleNotFoundError as exc: + conn.close() raise RuntimeError( "sqlite-vec is required for this desktop-runtime proof of concept. " "Install it manually with `pip install sqlite-vec` before running " "the sqlite_vec_poc tests." ) from exc sqlite_vec.load(conn) + conn.enable_load_extension(False) return conn

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/src/find_api/core/sqlite_vec_poc.py` around lines 11 - 25, In the create_connection function, close the database connection before raising the RuntimeError when ModuleNotFoundError is caught to prevent resource leaks. Additionally, immediately after calling sqlite_vec.load(conn), disable extension loading by calling conn.enable_load_extension(False) to minimize the attack surface and follow the principle of least privilege.

coderabbitai · 2026-06-20T05:10:19Z

+def insert_vector(
+    conn,
+    media_id: int,
+    embedding: list[float],
+):
+    blob = struct.pack(f"{len(embedding)}f", *embedding)


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

wc -l backend/src/find_api/core/sqlite_vec_poc.py

Repository: Abhash-Chakraborty/Find

Length of output: 115

🏁 Script executed:

cat -n backend/src/find_api/core/sqlite_vec_poc.py

Repository: Abhash-Chakraborty/Find

Length of output: 5602

Enforce embedding dimension contract at write/search boundaries (lines 57, 97-100, 123-126).

Current vector functions (insert_vector, search_vectors, search_media) accept any embedding length and pack it without validating against EMBEDDING_DIM = 768, allowing silent dimension mismatches that only fail later with cryptic sqlite-vec errors. Add explicit dimension validation before struct.pack() for deterministic errors and contract safety.

Suggested patch

EMBEDDING_DIM = 768 +def _validate_embedding_dim(embedding: list[float], expected_dim: int = EMBEDDING_DIM) -> None: + if len(embedding) != expected_dim: + raise ValueError( + f"embedding must have exactly {expected_dim} dimensions, got {len(embedding)}" + ) + @@ def insert_vector( @@ ): + _validate_embedding_dim(embedding) blob = struct.pack(f"{len(embedding)}f", *embedding) @@ def search_vectors( @@ ): + _validate_embedding_dim(query_embedding) blob = struct.pack( @@ def search_media( @@ ): + _validate_embedding_dim(query_embedding) blob = struct.pack(

This aligns with the guideline: "Keep EMBEDDING_DIM aligned with the configured CLIP/SigLIP model and pgvector columns."

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def insert_vector(

conn,

media_id: int,

embedding: list[float],

):

blob = struct.pack(f"{len(embedding)}f", *embedding)

def insert_vector(

conn,

media_id: int,

embedding: list[float],

):

_validate_embedding_dim(embedding)

blob = struct.pack(f"{len(embedding)}f", *embedding)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/src/find_api/core/sqlite_vec_poc.py` around lines 52 - 57, Add explicit embedding dimension validation to enforce the EMBEDDING_DIM = 768 contract in three locations: before the struct.pack() call in insert_vector, and before any embedding processing in search_vectors and search_media. For each function, verify that the length of the embedding parameter matches EMBEDDING_DIM and raise a clear ValueError if it does not, preventing silent dimension mismatches from causing cryptic sqlite-vec errors downstream.

Source: Coding guidelines

coderabbitai · 2026-06-20T05:10:19Z

+        (media_id, blob),
+    )
+
+    conn.commit()


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

find . -type f -name "sqlite_vec_poc.py" | head -5

Repository: Abhash-Chakraborty/Find

Length of output: 113

🏁 Script executed:

wc -l backend/src/find_api/core/sqlite_vec_poc.py

Repository: Abhash-Chakraborty/Find

Length of output: 115

🏁 Script executed:

cat -n backend/src/find_api/core/sqlite_vec_poc.py

Repository: Abhash-Chakraborty/Find

Length of output: 5602

Make media+vector insertion atomic to prevent partial writes.

SQLiteVecPOC.insert_media() (lines 156-172) calls insert_media() and insert_vector() sequentially. Both helper functions perform independent commits (lines 83 and 67 respectively). If insert_vector() fails after insert_media() commits, the database is left with an orphaned media record without a corresponding vector.

Wrap both operations in a transaction context to ensure atomicity:

Suggested patch

def insert_media( self, media_id, filename, embedding, ): - insert_media( - self.conn, - media_id, - filename, - ) - - insert_vector( - self.conn, - media_id, - embedding, - ) + with self.conn: + insert_media( + self.conn, + media_id, + filename, + ) + insert_vector( + self.conn, + media_id, + embedding, + )

Also remove conn.commit() calls from the helper functions at lines 67 and 83.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/src/find_api/core/sqlite_vec_poc.py` at line 67, The insert_media() method creates a race condition where orphaned media records can be created if insert_vector() fails after insert_media() commits. Remove the independent conn.commit() calls from both helper functions (the one at line 67 in the vector insertion helper and the one at line 83 in the media insertion helper). Then wrap both the insert_media() and insert_vector() calls within the main insert_media() method in a single transaction context that commits once after both operations complete successfully, ensuring atomicity.

coderabbitai · 2026-06-20T05:10:19Z

+To run the spike manually:
+
+```bash
+cd backend
+pip install sqlite-vec
+uv run pytest tests/test_sqlite_vec_poc.py -q
+```


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# Check if uv is used in this project fd -e "uv.lock" -o -e "pyproject.toml" | head -20

Repository: Abhash-Chakraborty/Find

Length of output: 184

🏁 Script executed:

# Check for the file in question fd "desktop-runtime-adr.md"

Repository: Abhash-Chakraborty/Find

Length of output: 113

🏁 Script executed:

# List backend directory structure ls -la backend/ 2>/dev/null || echo "backend directory not found"

Repository: Abhash-Chakraborty/Find

Length of output: 880

🏁 Script executed:

# Read the actual file content around lines 88-94 sed -n '80,100p' docs/plans/not-started/desktop-runtime-adr.md

Repository: Abhash-Chakraborty/Find

Length of output: 729

🏁 Script executed:

# Check the backend pyproject.toml to understand the project setup cat backend/pyproject.toml

Repository: Abhash-Chakraborty/Find

Length of output: 1883

🏁 Script executed:

# Check if there's a root pyproject.toml with uv tool configuration cat pyproject.toml 2>/dev/null || echo "No root pyproject.toml"

Repository: Abhash-Chakraborty/Find

Length of output: 90

Use uv pip install to match the project's environment.

The repository uses uv for package management (evidenced by backend/uv.lock and [tool.uv.*] sections in backend/pyproject.toml). Using plain pip install sqlite-vec can install the package outside the uv environment, causing readers to hit the skip path even after following these steps.

♻️ Suggested edit

cd backend -pip install sqlite-vec +uv pip install sqlite-vec uv run pytest tests/test_sqlite_vec_poc.py -q

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/plans/not-started/desktop-runtime-adr.md` around lines 88 - 94, The spike running instructions use plain pip install instead of the project's uv package manager, which can cause the package to be installed outside the uv environment and lead to false skips. Replace the line pip install sqlite-vec with uv pip install sqlite-vec to ensure the package is installed in the proper uv environment and matches the project's package management setup.

Source: Coding guidelines

Monica-CodingWorld · 2026-06-24T04:22:07Z

Hi @Abhash-Chakraborty,

Thank you for reviewing and approving the PR. I just wanted to check whether there's anything else needed from my side before it can be merged.
I've addressed the review feedback and the branch is up to date. If there's any additional change or clarification required, I'd be happy to help.

Thanks for your time!

Monica-CodingWorld added 4 commits June 13, 2026 22:39

feat: add sqlite-vec proof of concept

6e9b143

fix: remove misplaced sqlite-vec poc file

921aad8

feat: add sqlite-vec proof of concept

cda6688

test: add sqlite-vec proof of concept tests

81ae551

coderabbitai Bot reviewed Jun 14, 2026

View reviewed changes

Monica-CodingWorld added 3 commits June 14, 2026 20:57

style: format sqlite-vec proof of concept

15e472a

refactor: add SQLiteVecPOC wrapper for sqlite-vec prototype

b8f42a4

style: format sqlite-vec proof of concept

160cd2e

Monica-CodingWorld and others added 2 commits June 14, 2026 21:34

style: format sqlite-vec proof of concept

bda6cb8

docs: record sqlite vector spike results

38d0d26

Abhash-Chakraborty added gssoc26 Related to GirlScript Summer of Code 2026. backend FastAPI, database, storage, and API work architecture High-level design decisions and technical direction labels Jun 19, 2026

Abhash-Chakraborty changed the title ~~Feat/sqlite vec evaluation~~ spike: evaluate SQLite vector storage for desktop mode Jun 19, 2026

Abhash-Chakraborty approved these changes Jun 19, 2026

View reviewed changes

Merge branch 'main' into feat/sqlite-vec-evaluation

0d5d37f

coderabbitai Bot reviewed Jun 20, 2026

View reviewed changes

Merge branch 'main' into feat/sqlite-vec-evaluation

66b4b42

Conversation

Monica-CodingWorld commented Jun 14, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Type of change

What changed

Screenshots / recordings (for UI changes)

How to test

Install sqlite-vec

Run the proof-of-concept tests

Checklist

GSSoC'26 checklist

Summary by CodeRabbit

Uh oh!

github-actions Bot commented Jun 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Context Summary

Suggested issue links

Uh oh!

coderabbitai Bot commented Jun 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

macroscopeapp Bot commented Jun 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Approvability

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot Jun 14, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Monica-CodingWorld commented Jun 14, 2026

Uh oh!

github-actions Bot commented Jun 19, 2026

Uh oh!

Abhash-Chakraborty left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 20, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 20, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 20, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 20, 2026

Choose a reason for hiding this comment

Uh oh!

Monica-CodingWorld commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Monica-CodingWorld commented Jun 14, 2026 •

edited by coderabbitai Bot

Loading

github-actions Bot commented Jun 14, 2026 •

edited

Loading

coderabbitai Bot commented Jun 14, 2026 •

edited

Loading

macroscopeapp Bot commented Jun 14, 2026 •

edited

Loading