Skip to content

[Feature] Multi-model index support — separate indexes per model #52

@rawwerks

Description

@rawwerks

Problem

colgrep currently ties one index to one model per directory. If you switch models via --model, the existing index becomes incompatible and must be rebuilt. This makes it impractical to use multiple models on the same codebase — e.g., LateOn-Code-edge for code search and Reason-ModernColBERT for prose/docs search.

Use case

With the release of Reason-ModernColBERT (congrats on BrowseComp-Plus!), there's a compelling reason to run two models side-by-side:

  • Code model for searching code (tree-sitter structured representations)
  • Reason model for searching docs, journals, markdown, meeting notes

Today this requires choosing one model or constantly re-indexing when switching.

Proposed solution

Include the model identity in the index directory hash, so different models get separate index directories for the same project.

Current: {project_name}-{hash(path)}
Proposed: {project_name}-{hash(path:model)} (when a non-default model is used)

This is backwards compatible — the default model continues to use the legacy path-only hash, so existing indexes work without migration.

Implementation

I've prototyped this on my fork: rawwerks/next-plaid@feat/multi-model-index

Changes (243 insertions, 59 deletions across 6 files):

  • compute_index_dir_name() accepts optional model parameter
  • New get_index_dir_for_project_and_model() function
  • New IndexBuilder::with_model_identity() constructor
  • Search path uses model-aware index resolution
  • IndexState gains model_id field

Known gaps in the prototype (would fix before any PR):

  1. find_parent_index is not model-aware — could return wrong model's parent index for subdirectory searches
  2. model_id is added to IndexState but not yet populated on save
  3. status and clear commands don't accept --model flag yet
  4. Builder construction pattern is repeated (DRY violation)

Questions for maintainers

Before investing in a full PR, I'd like to know:

  1. Is this a direction you'd want to go? Or do you have a different approach in mind for multi-model support?
  2. Would you prefer the model identity to live in ProjectMetadata (project.json) rather than IndexState (state.json)?
  3. Any concerns about the hash-based separation approach vs. e.g., named profiles?

Happy to clean up and submit a proper PR if there's interest.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions