Pluggable chunking strategy trait for tree-sitter/AST-based code chunking

For code-heavy knowledge graphs, AST-based chunking (via tree-sitter) significantly outperforms token-based splitting by preserving syntactic boundaries and complete semantic units, which is critical for accurate code retrieval and generation tasks. Currently, the chunking logic appears to be embedded without a pluggable trait abstraction.

Would you be open to extracting chunking into a trait-based strategy pattern (e.g., ChunkingStrategy trait with a chunk(&self, text: &str) -> Vec<Chunk> method)? If a modular chunking interface exists or you'd accept such a refactor, I'd implement a tree-sitter-based strategy for my use case and could share it back if useful.

Context: Research shows tree-sitter chunking improves code RAG accuracy by 4-5+ points on retrieval and generation benchmarks compared to semantic/token chunking (see [CMU cAST paper](https://arxiv.org/html/2506.15655v1)).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pluggable chunking strategy trait for tree-sitter/AST-based code chunking #3

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Pluggable chunking strategy trait for tree-sitter/AST-based code chunking #3

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions