spike: evaluate SQLite vector storage for desktop mode#310
spike: evaluate SQLite vector storage for desktop mode#310Monica-CodingWorld wants to merge 11 commits into
Conversation
PR Context Summary
Suggested issue links
Use |
📝 WalkthroughWalkthroughAdds a new proof-of-concept module ChangesSQLite Vector Storage POC
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
ApprovabilityVerdict: Needs human review Unable to check for correctness in 38d0d26. This PR introduces a new SQLite vector storage proof-of-concept, which constitutes new feature capability. An unresolved review comment flags a potential SQL injection concern in the schema creation function where embedding_dim is interpolated into SQL without validation. No code changes detected at You can customize Macroscope's approvability policy. Learn more. |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@backend/src/find_api/core/sqlite_vec_poc.py`:
- Around line 1-129: The entire file sqlite_vec_poc.py has formatting issues
that violate the project's Ruff code style standards. Run the Ruff formatter on
this file using the command `uv run ruff format
backend/src/find_api/core/sqlite_vec_poc.py` to automatically fix all formatting
violations, then commit the formatted result to resolve the CI blocking issue.
- Around line 18-33: The create_schema function interpolates embedding_dim
directly into SQL using an f-string without validation. Since Python type hints
are not enforced at runtime, invalid input could produce malformed DDL. Validate
that embedding_dim is a positive integer before the SQL statement construction,
and raise an appropriate exception (such as ValueError) if the value is invalid,
negative, or not an integer. This ensures the SQL statement is always
well-formed and prevents potential SQL injection or schema corruption issues.
In `@backend/tests/test_sqlite_vec_poc.py`:
- Around line 1-4: The test file imports SQLiteVecPOC class and EMBEDDING_DIM
constant from the implementation, but these do not exist in
backend/src/find_api/core/sqlite_vec_poc.py which currently only provides
function-based helpers. Either add the SQLiteVecPOC class wrapper and
EMBEDDING_DIM constant to the implementation file to match the test's
class-based API expectations (supporting methods like gallery_query and search),
or rewrite all test code throughout the file (lines 10-75) to use the existing
function-based helpers instead of class methods. Choose one approach and ensure
the imports at the top and all test method calls throughout the file are
consistent with whichever API design you select.
- Around line 1-81: The test file contains formatting drift that is causing Ruff
checks to fail. Run the Ruff formatter to automatically fix all formatting
issues in the test file (which contains the test functions test_schema_creation,
test_insert_768_dimension_vector, test_similarity_search, and
test_gallery_query_shape). After formatting, re-run the full check suite to
verify that the formatting changes pass all Ruff checks and tests.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: a19c0025-e2cc-4aa0-ba33-83e87f73e838
📒 Files selected for processing (2)
backend/src/find_api/core/sqlite_vec_poc.pybackend/tests/test_sqlite_vec_poc.py
| def create_schema(conn, embedding_dim: int): | ||
| conn.execute(""" | ||
| CREATE TABLE IF NOT EXISTS media ( | ||
| id INTEGER PRIMARY KEY, | ||
| filename TEXT NOT NULL, | ||
| status TEXT NOT NULL | ||
| ) | ||
| """) | ||
|
|
||
| conn.execute(f""" | ||
| CREATE VIRTUAL TABLE IF NOT EXISTS media_vectors | ||
| USING vec0( | ||
| media_id INTEGER PRIMARY KEY, | ||
| embedding FLOAT[{embedding_dim}] | ||
| ) | ||
| """) |
There was a problem hiding this comment.
Validate embedding_dim before interpolating it into schema SQL.
embedding_dim is inserted directly into SQL text. Because Python hints are not enforced, malformed input can generate invalid DDL (or worse, altered SQL text). Validate/cast to a positive integer before building the statement.
Suggested fix
-def create_schema(conn, embedding_dim: int):
+def create_schema(conn, embedding_dim: int):
+ if not isinstance(embedding_dim, int) or embedding_dim <= 0:
+ raise ValueError("embedding_dim must be a positive integer")
+
conn.execute("""
CREATE TABLE IF NOT EXISTS media (
id INTEGER PRIMARY KEY,
filename TEXT NOT NULL,
status TEXT NOT NULL
)
""")As per coding guidelines, "Keep EMBEDDING_DIM aligned with the configured CLIP/SigLIP model and pgvector columns."
🧰 Tools
🪛 OpenGrep (1.22.0)
[ERROR] 27-33: SQL query built via f-string passed to execute()/executemany(). Use parameterized queries with placeholders instead.
(coderabbit.sql-injection.python-fstring-execute)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@backend/src/find_api/core/sqlite_vec_poc.py` around lines 18 - 33, The
create_schema function interpolates embedding_dim directly into SQL using an
f-string without validation. Since Python type hints are not enforced at
runtime, invalid input could produce malformed DDL. Validate that embedding_dim
is a positive integer before the SQL statement construction, and raise an
appropriate exception (such as ValueError) if the value is invalid, negative, or
not an integer. This ensures the SQL statement is always well-formed and
prevents potential SQL injection or schema corruption issues.
Sources: Coding guidelines, Linters/SAST tools
|
Hi @Abhash-Chakraborty ! I updated the branch to address the CodeRabbit feedback by adding the My branch is currently clean and only contains:
However, the CI is still failing. I noticed that Could you please take a quick look and let me know if there's anything else I should update in this PR? Thanks! |
|
@macroscope-app review Please review this PR against its linked issue, local-first privacy rules, and the current Find repo instructions. |
Abhash-Chakraborty
left a comment
There was a problem hiding this comment.
Reviewed against #256. I fixed the remaining acceptance gaps by documenting the sqlite-vec spike result/limitations in the desktop runtime ADR and making the POC dependency optional-safe so normal backend installs do not fail when sqlite-vec is not installed.\n\nTested locally:\n- uv run ruff check src/find_api/core/sqlite_vec_poc.py tests/test_sqlite_vec_poc.py\n- uv run pytest tests/test_sqlite_vec_poc.py -q\n\nLocal pytest result is expected on this machine: 1 passed, 4 skipped because sqlite-vec is optional and not installed. With sqlite-vec installed, the vector insert/search tests run. This keeps the PR scoped as a desktop/runtime spike and does not switch production away from PostgreSQL + pgvector.
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@backend/src/find_api/core/sqlite_vec_poc.py`:
- Line 67: The insert_media() method creates a race condition where orphaned
media records can be created if insert_vector() fails after insert_media()
commits. Remove the independent conn.commit() calls from both helper functions
(the one at line 67 in the vector insertion helper and the one at line 83 in the
media insertion helper). Then wrap both the insert_media() and insert_vector()
calls within the main insert_media() method in a single transaction context that
commits once after both operations complete successfully, ensuring atomicity.
- Around line 52-57: Add explicit embedding dimension validation to enforce the
EMBEDDING_DIM = 768 contract in three locations: before the struct.pack() call
in insert_vector, and before any embedding processing in search_vectors and
search_media. For each function, verify that the length of the embedding
parameter matches EMBEDDING_DIM and raise a clear ValueError if it does not,
preventing silent dimension mismatches from causing cryptic sqlite-vec errors
downstream.
- Around line 11-25: In the create_connection function, close the database
connection before raising the RuntimeError when ModuleNotFoundError is caught to
prevent resource leaks. Additionally, immediately after calling
sqlite_vec.load(conn), disable extension loading by calling
conn.enable_load_extension(False) to minimize the attack surface and follow the
principle of least privilege.
In `@docs/plans/not-started/desktop-runtime-adr.md`:
- Around line 88-94: The spike running instructions use plain pip install
instead of the project's uv package manager, which can cause the package to be
installed outside the uv environment and lead to false skips. Replace the line
pip install sqlite-vec with uv pip install sqlite-vec to ensure the package is
installed in the proper uv environment and matches the project's package
management setup.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 93d74ea2-7df8-46e8-8bc7-6256e5394d45
📒 Files selected for processing (3)
backend/src/find_api/core/sqlite_vec_poc.pybackend/tests/test_sqlite_vec_poc.pydocs/plans/not-started/desktop-runtime-adr.md
| def create_connection(db_path=":memory:"): | ||
| conn = sqlite3.connect(str(db_path)) | ||
| conn.enable_load_extension(True) | ||
|
|
||
| try: | ||
| import sqlite_vec | ||
| except ModuleNotFoundError as exc: | ||
| raise RuntimeError( | ||
| "sqlite-vec is required for this desktop-runtime proof of concept. " | ||
| "Install it manually with `pip install sqlite-vec` before running " | ||
| "the sqlite_vec_poc tests." | ||
| ) from exc | ||
|
|
||
| sqlite_vec.load(conn) | ||
| return conn |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, check if the file exists and view its content
cat -n backend/src/find_api/core/sqlite_vec_poc.pyRepository: Abhash-Chakraborty/Find
Length of output: 5602
🏁 Script executed:
# Search for tests related to this POC
fd -t f "test.*sqlite" . --type f
fd -t f ".*test.*" backend/ --type f | grep -i sqliteRepository: Abhash-Chakraborty/Find
Length of output: 143
🏁 Script executed:
# Check usage of create_connection across the codebase
rg "create_connection" --type py -B 2 -A 2Repository: Abhash-Chakraborty/Find
Length of output: 774
🏁 Script executed:
# Check the test file to see if create_connection is tested
cat -n backend/tests/test_sqlite_vec_poc.pyRepository: Abhash-Chakraborty/Find
Length of output: 2909
🏁 Script executed:
# Verify if sqlite_vec is actually a dependency by checking requirements/setup
fd -t f "requirements" backend/ --type f
fd -t f "setup.py" . --type f
fd -t f "pyproject.toml" . --type fRepository: Abhash-Chakraborty/Find
Length of output: 92
🏁 Script executed:
# Check if there are any other places where conn.enable_load_extension is used
rg "enable_load_extension" --type py -B 2 -A 2Repository: Abhash-Chakraborty/Find
Length of output: 418
🏁 Script executed:
# Check if there's any cleanup logic in SQLiteVecPOC class or if it handles connection closure
rg "close\(\)" backend/src/find_api/core/sqlite_vec_poc.py -B 2 -A 2
rg "def __del__" backend/src/find_api/core/sqlite_vec_poc.py -B 2 -A 5Repository: Abhash-Chakraborty/Find
Length of output: 49
🏁 Script executed:
# Check pyproject.toml to see if sqlite_vec is a dependency
cat -n backend/pyproject.toml | head -80Repository: Abhash-Chakraborty/Find
Length of output: 2357
🏁 Script executed:
# Verify best practices for sqlite3 connection error handling
web_search
python sqlite3 connection cleanup best practices error handlingRepository: Abhash-Chakraborty/Find
Length of output: 206
🌐 Web query:
sqlite3 python connection cleanup best practices exception handling
💡 Result:
When managing SQLite connections in Python, it is critical to distinguish between transaction management and resource cleanup. The built-in context manager for sqlite3 connections handles only transactions, not the closing of the database connection itself [1][2][3][4]. Best Practices for Cleanup and Exception Handling: 1. Use contextlib.closing for Automatic Cleanup: Since the standard sqlite3 connection context manager does not call.close, use contextlib.closing to ensure the connection is closed automatically when the block exits, regardless of whether an exception occurs [5][6][7][8]. import sqlite3 from contextlib import closing with closing(sqlite3.connect("database.db")) as conn: with conn: # Handles transaction commit/rollback conn.execute("INSERT INTO table_name VALUES (?)", (data,)) 2. Manual Cleanup with try/finally: If you do not use contextlib.closing, you must explicitly call.close in a finally block to guarantee the connection is released, even if an error occurs [9]. conn = sqlite3.connect("database.db") try: with conn: conn.execute("INSERT INTO table_name VALUES (?)", (data,)) except sqlite3.Error as e: print(f"Database error: {e}") finally: conn.close 3. Understanding Transaction Management: The context manager used directly on a connection object (e.g., with conn:) automatically commits if the block finishes successfully and rolls back if an exception is raised [1][2][4]. It is intended for managing transaction boundaries, not for closing the connection [7]. 4. Key Considerations: - Do not rely on garbage collection: Explicitly closing connections prevents resource leaks and file locking issues [9][8]. - Catch specific exceptions: Use sqlite3.Error or its subclasses (e.g., sqlite3.OperationalError, sqlite3.IntegrityError) to handle database-related issues specifically [1][2][9]. - Keep connections scoped: Open the connection as late as possible and close it as soon as you are done to minimize the time the database file is locked [9][10].
Citations:
- 1: https://docs.python.org/release/3.10.19/library/sqlite3.html
- 2: https://docs.python.org/3.9/library/sqlite3.html
- 3: https://bugs.python.org/issue34580
- 4: https://docs.python.org/release/3.11.1/library/sqlite3.html
- 5: https://stackoverflow.com/questions/9561832/what-if-i-dont-close-the-database-connection
- 6: https://discuss.python.org/t/implicitly-close-sqlite3-connections-with-context-managers/33320
- 7: https://stackoverflow.com/questions/8630423/automate-opening-and-closing-the-database-connection
- 8: https://alexwlchan.net/notes/2024/sqlite3-context-manager-doesnt-close-connections/
- 9: https://zetcode.com/python/sqlite3-connection-close/
- 10: https://zetcode.com/python/sqlite3-connect/
🏁 Script executed:
# Check if sqlite_vec.load actually needs the extension to remain enabled
# Look for any documentation or pattern in the codebase about extension loading
rg "sqlite_vec" --type py -B 3 -A 3Repository: Abhash-Chakraborty/Find
Length of output: 4569
🏁 Script executed:
# Check if there are any other POC files or similar patterns in the codebase
fd -t f "poc\|proof" --type fRepository: Abhash-Chakraborty/Find
Length of output: 49
🏁 Script executed:
# Check if sqlite_vec.load needs the extension to stay enabled after loading
# Look for any documentation or check how sqlite-vec typically works
web_search
sqlite-vec python extension loading best practices enable_load_extensionRepository: Abhash-Chakraborty/Find
Length of output: 167
🏁 Script executed:
# Verify the actual impact of the resource leak by examining test more carefully
# and checking if there are any __del__ or context manager implementations
rg "__enter__|__exit__|__del__" backend/src/find_api/core/sqlite_vec_poc.py -B 2 -A 5Repository: Abhash-Chakraborty/Find
Length of output: 49
Close connection on missing sqlite_vec import and disable extension loading after load.
On ModuleNotFoundError, the opened DB connection is leaked (line 12). Additionally, extension loading should be disabled immediately after sqlite_vec.load(conn) to minimize attack surface.
Suggested patch
def create_connection(db_path=":memory:"):
conn = sqlite3.connect(str(db_path))
conn.enable_load_extension(True)
try:
import sqlite_vec
except ModuleNotFoundError as exc:
+ conn.close()
raise RuntimeError(
"sqlite-vec is required for this desktop-runtime proof of concept. "
"Install it manually with `pip install sqlite-vec` before running "
"the sqlite_vec_poc tests."
) from exc
sqlite_vec.load(conn)
+ conn.enable_load_extension(False)
return conn🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@backend/src/find_api/core/sqlite_vec_poc.py` around lines 11 - 25, In the
create_connection function, close the database connection before raising the
RuntimeError when ModuleNotFoundError is caught to prevent resource leaks.
Additionally, immediately after calling sqlite_vec.load(conn), disable extension
loading by calling conn.enable_load_extension(False) to minimize the attack
surface and follow the principle of least privilege.
| def insert_vector( | ||
| conn, | ||
| media_id: int, | ||
| embedding: list[float], | ||
| ): | ||
| blob = struct.pack(f"{len(embedding)}f", *embedding) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
wc -l backend/src/find_api/core/sqlite_vec_poc.pyRepository: Abhash-Chakraborty/Find
Length of output: 115
🏁 Script executed:
cat -n backend/src/find_api/core/sqlite_vec_poc.pyRepository: Abhash-Chakraborty/Find
Length of output: 5602
Enforce embedding dimension contract at write/search boundaries (lines 57, 97-100, 123-126).
Current vector functions (insert_vector, search_vectors, search_media) accept any embedding length and pack it without validating against EMBEDDING_DIM = 768, allowing silent dimension mismatches that only fail later with cryptic sqlite-vec errors. Add explicit dimension validation before struct.pack() for deterministic errors and contract safety.
Suggested patch
EMBEDDING_DIM = 768
+def _validate_embedding_dim(embedding: list[float], expected_dim: int = EMBEDDING_DIM) -> None:
+ if len(embedding) != expected_dim:
+ raise ValueError(
+ f"embedding must have exactly {expected_dim} dimensions, got {len(embedding)}"
+ )
+
@@
def insert_vector(
@@
):
+ _validate_embedding_dim(embedding)
blob = struct.pack(f"{len(embedding)}f", *embedding)
@@
def search_vectors(
@@
):
+ _validate_embedding_dim(query_embedding)
blob = struct.pack(
@@
def search_media(
@@
):
+ _validate_embedding_dim(query_embedding)
blob = struct.pack(This aligns with the guideline: "Keep EMBEDDING_DIM aligned with the configured CLIP/SigLIP model and pgvector columns."
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def insert_vector( | |
| conn, | |
| media_id: int, | |
| embedding: list[float], | |
| ): | |
| blob = struct.pack(f"{len(embedding)}f", *embedding) | |
| def insert_vector( | |
| conn, | |
| media_id: int, | |
| embedding: list[float], | |
| ): | |
| _validate_embedding_dim(embedding) | |
| blob = struct.pack(f"{len(embedding)}f", *embedding) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@backend/src/find_api/core/sqlite_vec_poc.py` around lines 52 - 57, Add
explicit embedding dimension validation to enforce the EMBEDDING_DIM = 768
contract in three locations: before the struct.pack() call in insert_vector, and
before any embedding processing in search_vectors and search_media. For each
function, verify that the length of the embedding parameter matches
EMBEDDING_DIM and raise a clear ValueError if it does not, preventing silent
dimension mismatches from causing cryptic sqlite-vec errors downstream.
Source: Coding guidelines
| (media_id, blob), | ||
| ) | ||
|
|
||
| conn.commit() |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
find . -type f -name "sqlite_vec_poc.py" | head -5Repository: Abhash-Chakraborty/Find
Length of output: 113
🏁 Script executed:
wc -l backend/src/find_api/core/sqlite_vec_poc.pyRepository: Abhash-Chakraborty/Find
Length of output: 115
🏁 Script executed:
cat -n backend/src/find_api/core/sqlite_vec_poc.pyRepository: Abhash-Chakraborty/Find
Length of output: 5602
Make media+vector insertion atomic to prevent partial writes.
SQLiteVecPOC.insert_media() (lines 156-172) calls insert_media() and insert_vector() sequentially. Both helper functions perform independent commits (lines 83 and 67 respectively). If insert_vector() fails after insert_media() commits, the database is left with an orphaned media record without a corresponding vector.
Wrap both operations in a transaction context to ensure atomicity:
Suggested patch
def insert_media(
self,
media_id,
filename,
embedding,
):
- insert_media(
- self.conn,
- media_id,
- filename,
- )
-
- insert_vector(
- self.conn,
- media_id,
- embedding,
- )
+ with self.conn:
+ insert_media(
+ self.conn,
+ media_id,
+ filename,
+ )
+ insert_vector(
+ self.conn,
+ media_id,
+ embedding,
+ )Also remove conn.commit() calls from the helper functions at lines 67 and 83.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@backend/src/find_api/core/sqlite_vec_poc.py` at line 67, The insert_media()
method creates a race condition where orphaned media records can be created if
insert_vector() fails after insert_media() commits. Remove the independent
conn.commit() calls from both helper functions (the one at line 67 in the vector
insertion helper and the one at line 83 in the media insertion helper). Then
wrap both the insert_media() and insert_vector() calls within the main
insert_media() method in a single transaction context that commits once after
both operations complete successfully, ensuring atomicity.
| To run the spike manually: | ||
|
|
||
| ```bash | ||
| cd backend | ||
| pip install sqlite-vec | ||
| uv run pytest tests/test_sqlite_vec_poc.py -q | ||
| ``` |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Check if uv is used in this project
fd -e "uv.lock" -o -e "pyproject.toml" | head -20Repository: Abhash-Chakraborty/Find
Length of output: 184
🏁 Script executed:
# Check for the file in question
fd "desktop-runtime-adr.md"Repository: Abhash-Chakraborty/Find
Length of output: 113
🏁 Script executed:
# List backend directory structure
ls -la backend/ 2>/dev/null || echo "backend directory not found"Repository: Abhash-Chakraborty/Find
Length of output: 880
🏁 Script executed:
# Read the actual file content around lines 88-94
sed -n '80,100p' docs/plans/not-started/desktop-runtime-adr.mdRepository: Abhash-Chakraborty/Find
Length of output: 729
🏁 Script executed:
# Check the backend pyproject.toml to understand the project setup
cat backend/pyproject.tomlRepository: Abhash-Chakraborty/Find
Length of output: 1883
🏁 Script executed:
# Check if there's a root pyproject.toml with uv tool configuration
cat pyproject.toml 2>/dev/null || echo "No root pyproject.toml"Repository: Abhash-Chakraborty/Find
Length of output: 90
Use uv pip install to match the project's environment.
The repository uses uv for package management (evidenced by backend/uv.lock and [tool.uv.*] sections in backend/pyproject.toml). Using plain pip install sqlite-vec can install the package outside the uv environment, causing readers to hit the skip path even after following these steps.
♻️ Suggested edit
cd backend
-pip install sqlite-vec
+uv pip install sqlite-vec
uv run pytest tests/test_sqlite_vec_poc.py -q🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@docs/plans/not-started/desktop-runtime-adr.md` around lines 88 - 94, The
spike running instructions use plain pip install instead of the project's uv
package manager, which can cause the package to be installed outside the uv
environment and lead to false skips. Replace the line pip install sqlite-vec
with uv pip install sqlite-vec to ensure the package is installed in the proper
uv environment and matches the project's package management setup.
Source: Coding guidelines
|
Thank you for reviewing and approving the PR. I just wanted to check whether there's anything else needed from my side before it can be merged. Thanks for your time! |
Summary
Adds a SQLite + sqlite-vec proof of concept to evaluate whether SQLite can support Find's metadata storage and vector-search requirements for desktop mode without changing the current PostgreSQL + pgvector runtime.
This implementation creates a small evaluation layer that demonstrates metadata storage, vector insertion, similarity search, and gallery-style queries using sqlite-vec.
Fixes #256
Type of change
What changed
Screenshots / recordings (for UI changes)
Attach before/after screenshots or a short video.
How to test
Install sqlite-vec
pip install sqlite-vec
Run the proof-of-concept tests
pytest tests/test_sqlite_vec_poc.py -v
Manual validation performed:
Checklist
GSSoC'26 checklist
Summary by CodeRabbit
New Features
Tests
sqlite-vecdependency, including skip behavior when unavailable and a clear error assertion when it’s missing.Documentation