Problem
colgrep currently ties one index to one model per directory. If you switch models via --model, the existing index becomes incompatible and must be rebuilt. This makes it impractical to use multiple models on the same codebase — e.g., LateOn-Code-edge for code search and Reason-ModernColBERT for prose/docs search.
Use case
With the release of Reason-ModernColBERT (congrats on BrowseComp-Plus!), there's a compelling reason to run two models side-by-side:
- Code model for searching code (tree-sitter structured representations)
- Reason model for searching docs, journals, markdown, meeting notes
Today this requires choosing one model or constantly re-indexing when switching.
Proposed solution
Include the model identity in the index directory hash, so different models get separate index directories for the same project.
Current: {project_name}-{hash(path)}
Proposed: {project_name}-{hash(path:model)} (when a non-default model is used)
This is backwards compatible — the default model continues to use the legacy path-only hash, so existing indexes work without migration.
Implementation
I've prototyped this on my fork: rawwerks/next-plaid@feat/multi-model-index
Changes (243 insertions, 59 deletions across 6 files):
compute_index_dir_name() accepts optional model parameter
- New
get_index_dir_for_project_and_model() function
- New
IndexBuilder::with_model_identity() constructor
- Search path uses model-aware index resolution
IndexState gains model_id field
Known gaps in the prototype (would fix before any PR):
find_parent_index is not model-aware — could return wrong model's parent index for subdirectory searches
model_id is added to IndexState but not yet populated on save
status and clear commands don't accept --model flag yet
- Builder construction pattern is repeated (DRY violation)
Questions for maintainers
Before investing in a full PR, I'd like to know:
- Is this a direction you'd want to go? Or do you have a different approach in mind for multi-model support?
- Would you prefer the model identity to live in
ProjectMetadata (project.json) rather than IndexState (state.json)?
- Any concerns about the hash-based separation approach vs. e.g., named profiles?
Happy to clean up and submit a proper PR if there's interest.
Problem
colgrep currently ties one index to one model per directory. If you switch models via
--model, the existing index becomes incompatible and must be rebuilt. This makes it impractical to use multiple models on the same codebase — e.g.,LateOn-Code-edgefor code search andReason-ModernColBERTfor prose/docs search.Use case
With the release of Reason-ModernColBERT (congrats on BrowseComp-Plus!), there's a compelling reason to run two models side-by-side:
Today this requires choosing one model or constantly re-indexing when switching.
Proposed solution
Include the model identity in the index directory hash, so different models get separate index directories for the same project.
Current:
{project_name}-{hash(path)}Proposed:
{project_name}-{hash(path:model)}(when a non-default model is used)This is backwards compatible — the default model continues to use the legacy path-only hash, so existing indexes work without migration.
Implementation
I've prototyped this on my fork:
rawwerks/next-plaid@feat/multi-model-indexChanges (243 insertions, 59 deletions across 6 files):
compute_index_dir_name()accepts optional model parameterget_index_dir_for_project_and_model()functionIndexBuilder::with_model_identity()constructorIndexStategainsmodel_idfieldKnown gaps in the prototype (would fix before any PR):
find_parent_indexis not model-aware — could return wrong model's parent index for subdirectory searchesmodel_idis added toIndexStatebut not yet populated on savestatusandclearcommands don't accept--modelflag yetQuestions for maintainers
Before investing in a full PR, I'd like to know:
ProjectMetadata(project.json) rather thanIndexState(state.json)?Happy to clean up and submit a proper PR if there's interest.