Skip to content

Dev#18

Merged
anaslimem merged 6 commits intomainfrom
dev
Mar 9, 2026
Merged

Dev#18
anaslimem merged 6 commits intomainfrom
dev

Conversation

@anaslimem
Copy link
Owner

Description

Big refactor

Copilot AI review requested due to automatic review settings March 9, 2026 05:17
@vercel
Copy link

vercel bot commented Mar 9, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
cortexa-db Ready Ready Preview, Comment Mar 9, 2026 5:18am

@anaslimem anaslimem merged commit 08c94e2 into main Mar 9, 2026
5 checks passed
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors CortexaDB’s public API and internal terminology from namespaces to collections, and renames primary operations from remember/ask/delete_memory to add/search/delete across Rust core, Python bindings, tests, examples, and documentation.

Changes:

  • Renamed core APIs (rememberadd, asksearch, delete_memorydelete) and updated scoping (namespacecollection) end-to-end.
  • Updated query/index/graph/storage layers to scope by collection and renamed associated types/functions.
  • Migrated docs, examples, benchmarks, and test suites to the new API surface; bumped crate versions in Cargo.lock.

Reviewed changes

Copilot reviewed 46 out of 50 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
examples/rust/basic_usage.rs Updates Rust example to add/search and collection-scoped batch records.
docs/resources/examples.md Updates example snippets to add/search and collections terminology.
docs/index.md Updates docs index link from namespaces to collections.
docs/guides/replay.md Updates replay guide terminology and operation names (add/delete).
docs/guides/query-engine.md Updates query engine guide to collection scoping and search.
docs/guides/namespaces.md Removes namespaces guide.
docs/guides/embedders.md Updates embedder guide examples to add/search.
docs/guides/core-concepts.md Updates core concepts terminology to collections + new API names.
docs/guides/collections.md Adds new collections guide replacing namespaces guide.
docs/guides/chunking.md Updates ingestion examples to use collection= parameter.
docs/getting-started/quickstart.md Updates quickstart examples to add/search and collections.
docs/content/docs/resources/examples.mdx Mirrors examples updates for site content.
docs/content/docs/guides/replay.mdx Mirrors replay guide updates for site content.
docs/content/docs/getting-started/quickstart.mdx Mirrors quickstart updates for site content.
docs/content/docs/api/rust.mdx Updates Rust API reference to add/search/delete and collection field.
docs/api/rust.md Updates Rust API reference to add/search/delete and collection field.
docs/api/python.md Updates Python API reference to add/search/delete and collections terminology.
crates/cortexadb-py/test_stress.py Migrates stress tests to db.add.
crates/cortexadb-py/test_smoke.py Migrates smoke tests to collections + add/search/delete and wrapper usage.
crates/cortexadb-py/src/lib.rs Updates PyO3 bindings: rename methods, remove namespace field, map to collection APIs.
crates/cortexadb-py/cortexadb/replay.py Renames replay writer op from remember to add.
crates/cortexadb-py/cortexadb/providers/openai.py Updates provider docstring example to add/search.
crates/cortexadb-py/cortexadb/providers/ollama.py Updates provider docstring example to add/search and cleans imports.
crates/cortexadb-py/cortexadb/providers/gemini.py Updates provider docstring example to add/search.
crates/cortexadb-py/cortexadb/loader.py Cleans unused typing imports.
crates/cortexadb-py/cortexadb/embedder.py Updates embedder docs/examples to add/search.
crates/cortexadb-py/cortexadb/client.py Refactors Python wrapper to collections + add/search/delete, replay op rename, removes legacy aliases.
crates/cortexadb-py/cortexadb/chunker.py Cleans unused typing imports.
crates/cortexadb-py/.gitignore Adds common build artifacts (*.whl, *.egg-info, etc.).
crates/cortexadb-core/tests/integration.rs Migrates integration tests to add/search/delete and collection field.
crates/cortexadb-core/src/store.rs Renames delete op + indexes embedding by collection.
crates/cortexadb-core/src/storage/wal.rs Updates WAL tests for Command::Delete.
crates/cortexadb-core/src/storage/serialization.rs Updates serialization test to expect collection.
crates/cortexadb-core/src/storage/segment.rs Updates segment test to expect collection.
crates/cortexadb-core/src/query/hybrid.rs Renames query option scoping to collection and updates graph traversal helpers.
crates/cortexadb-core/src/query/executor.rs Renames executor scoping to collection and updates graph traversal helpers.
crates/cortexadb-core/src/index/vector.rs Refactors vector partitions/indexing from namespace to collection.
crates/cortexadb-core/src/index/graph.rs Refactors graph BFS helpers from namespace to collection and enforces collection boundaries.
crates/cortexadb-core/src/facade.rs Renames public facade API to add/search/delete and updates collection scoping/filtering.
crates/cortexadb-core/src/engine.rs Updates engine command handling to Command::Delete and collection terminology.
crates/cortexadb-core/src/core/state_machine.rs Refactors state machine terminology/errors to collection scoping and delete rename.
crates/cortexadb-core/src/core/memory_entry.rs Renames memory field namespacecollection.
crates/cortexadb-core/src/core/command.rs Renames command variant to Delete and updates constructors/tests.
crates/cortexadb-core/src/bin/sync_bench.rs Updates bench CLI/config from namespace to collection.
crates/cortexadb-core/src/bin/startup_bench.rs Migrates bench seeding to db.add.
crates/cortexadb-core/src/bin/manual_store.rs Updates manual store example query options to collection scoping.
crates/cortexadb-core/benches/storage_bench.rs Updates benchmark ingestion to db.add.
benchmark/cortexadb_runner.py Updates benchmark runner to add and _inner.search_embedding.
Cargo.lock Bumps cortexadb-core and cortexadb-py versions to 0.1.8.
CHANGELOG.md Removes changelog file.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 253 to 262
limit = limit or kwargs.get("limit") or kwargs.get("top_k", 5)
vector = vector or kwargs.get("vector") or kwargs.get("embedding") or kwargs.get("query_vector")
collections = collections or kwargs.get("collections") or kwargs.get("namespaces")
collections = collections or kwargs.get("collections") or kwargs.get("collection")
vec = self._resolve_embedding(query, vector)

if collections is None:
base_hits = self._inner.ask_embedding(vec, top_k=limit, filter=filter)
base_hits = self._inner.search_embedding(vec, top_k=limit, filter=filter)
elif len(collections) == 1:
base_hits = self._inner.ask_in_collection(collections[0], vec, top_k=limit, filter=filter)
base_hits = self._inner.search_in_collection(collections[0], vec, top_k=limit, filter=filter)
else:
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

collections = ... or kwargs.get("collection") can set collections to a string when callers pass collection="agent_a". Since strings are iterable, len(collections) and collections[0] then behave incorrectly (length of the name; first character as the collection), breaking scoped search. Normalize a single collection string into a one-element list (and/or keep namespaces alias if you still want backwards compatibility).

Copilot uses AI. Check for mistakes.
Comment on lines 196 to 207
report["op_counts"][op_type] = report["op_counts"].get(op_type, 0) + 1

try:
if op_type == "remember":
if op_type == "add":
new_id = db.add(
text=op.get("text"),
vector=op.get("embedding"),
metadata=op.get("metadata"),
collection=op.get("collection") or op.get("namespace", "default")
collection=op.get("collection") or "default"
)
id_map[op.get("id")] = new_id
report["exported"] += 1
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replay() builds report using keys like checked/exported, but the documented last_replay_report schema expects total_ops, applied, skipped, failed, op_counts, and failures (see docs/api/python.md). Consider aligning the report keys/counting (e.g., total_ops == operations read, applied == ops successfully executed) and collecting failure details up to the documented limit. Also, docs describe strict=False as the default, but replay() currently defaults strict to True; aligning defaults would avoid surprising behavior for callers.

Copilot uses AI. Check for mistakes.
match cmd {
Command::DeleteMemory(id) => assert_eq!(id, MemoryId(1)),
Command::Delete(id) => assert_eq!(id, MemoryId(1)),
_ => panic!("Expected DeleteMemory"),
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The panic message in this test still says "Expected DeleteMemory" even though the variant is now Command::Delete. Updating the message will make failures less confusing.

Suggested change
_ => panic!("Expected DeleteMemory"),
_ => panic!("Expected Delete"),

Copilot uses AI. Check for mistakes.
Comment on lines 150 to +152
// Bulk insert with 100x speedup
let last_id = db.remember_batch(records)?;
println!(" Batch finished. Last inserted ID: {}", last_id);
let last_id = db.add_batch(records)?;
println!(" Batch finished. Last inserted ID: {}", last_id.last().unwrap());
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

db.add_batch(records) now returns a Vec<u64> of inserted IDs, but the variable is named last_id and the example prints last_id.last().unwrap(). Renaming the variable (e.g., ids) and avoiding the unwrap() (or explicitly asserting the vector is non-empty in the example) would make the sample clearer and less panic-prone if someone adapts it to an empty batch.

Copilot uses AI. Check for mistakes.
Comment on lines 20 to 22
for hit in hits:
mem = db.get_memory(hit.id)
print(f"[{hit.score:.3f}] {mem.content.decode()}")
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These Python examples still use db.get_memory(hit.id), but the current Python wrapper API exposes db.get(mid) (and tests use db.get(...)). Update the docs to match the public Python API so the snippets run as-is.

Copilot uses AI. Check for mistakes.
writer.add("Draft: AI agents are transforming...")

# Each agent queries only its own memories
planner_context = planner.search("What tsearchs are pending?")
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in example query string: "What tsearchs are pending?" looks unintended and should read "What tasks are pending?" (or similar).

Copilot uses AI. Check for mistakes.
@@ -68,9 +68,9 @@ db.ingest("Long article text here...", strategy="markdown")
### 7. Use Namespaces
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Section title still says "Use Namespaces" but the example has been migrated to db.collection(...). Update the heading to "Use Collections" to match the new API terminology.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants