-
Notifications
You must be signed in to change notification settings - Fork 1.3k
feat(vector_io): Implement Contextual Retrieval for improved RAG search quality #4750
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
This pull request has merge conflicts that must be resolved before it can be merged. @r-bit-rry please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork |
|
@franciscojavierarceo @leseb please review |
| before embedding, improving search quality. See Anthropic's Contextual Retrieval. | ||
| """ | ||
|
|
||
| model: QualifiedModel | None = Field( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✱ Stainless preview buildsThis PR will update the Edit this comment to update it. It will appear in the SDK's changelogs. ✅ llama-stack-client-node studio · code · diff
❗ llama-stack-client-kotlin studio
✅ llama-stack-client-python studio · conflict
✅ llama-stack-client-go studio · conflict
✅ llama-stack-client-openapi studio · code · diff
This comment is auto-generated by GitHub Actions and is automatically kept up to date as you push. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great implementation! Thanks.
Looks like VectorStoreChunkingStrategyContextual and VectorStoreChunkingStrategyContextualConfig aren't in the__all__list in models.py. Otherwise they won't export properly. Please add them.
| pytest.skip("No text model configured for contextual chunking test") | ||
|
|
||
| compat_client = compat_client_with_empty_stores | ||
| if isinstance(compat_client, OpenAI): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test_openai_vector_store_contextual_chunking test was failing for openai_client fixture because:
- The OpenAI client has a hardcoded 30s timeout
- Contextual chunking requires LLM calls that can take longer than 30s with Ollama
- No recordings existed for the openai_client variant
Implements Contextual Retrieval as described in Anthropic's engineering blog, enabling LLM-powered chunk contextualization during file ingestion for improved vector search quality.
Closes #4003
Motivation
Traditional RAG systems embed chunks in isolation, losing important document context. For example, a chunk stating "The company's revenue grew by 3% over the previous quarter" lacks context about which company or time period. Contextual Retrieval addresses this by using an LLM to prepend situational context to each chunk before embedding, significantly improving retrieval accuracy.
Changes
New Chunking Strategy:
contextualAdded a new
VectorStoreChunkingStrategyContextualtype that can be specified when attaching files to vector stores:Server-Level Configuration
Added
ContextualRetrievalParamstoVectorStoresConfigfor server-level defaults, following the same pattern asRewriteQueryParams:Implementation Details
StrEnumpattern (_ChunkContextResult) for result tracking, following theHealthStatuspattern in the codebaseasyncio.gathercompletes (no shared mutable state or locks)RuntimeErrorto prevent silent data lossDesign Decisions
RewriteQueryParamspattern; provides flexibility while ensuring explicit configurationlen(content) / 4context_promptparameterStrEnumwithasyncio.gatherDefault Prompt Template
Uses the prompt from Anthropic's research:
Testing
Unit Tests (11 tests):
Integration Tests (2 tests):
Future Considerations
BREAKING CHANGE: The PR adds VectorStoreChunkingStrategyContextual to the API schema, which is a breaking change