Conversation
… of Python strategy layer BREAKING CHANGE: The ask() method and QueryContext have been removed from the Rust engine as retrieval functionality has been migrated to the Python strategy layer. Users should now use Engine.ask() from the Python SDK for retrieval operations. This removes the retrieval-related code from the Rust core including the ask method implementation, QueryContext struct, and RetrieverClient stub.
…iguration - Remove vectorless-agent crate from workspace members in Cargo.toml - Move vectorless-rerank from active member to commented out section - Remove entire vectorless-agent module including command parsing, configuration, and context modules - Update member list to reflect removal of agent-related components
BREAKING CHANGE: Removed the entire validator module from vectorless-config which was not being used. This affects the ConfigValidator implementation and related traits that were previously available. - Remove validator module from lib.rs - Delete entire validator.rs file with all validation logic - Remove unused re-exports in vectorless-llm/src/lib.rs - Remove unused imports and types throughout the codebase perf: optimize memo store by removing unused key builder - Remove MemoKeyBuilder struct and all related methods - Clean up unused Fingerprint import in store.rs - Remove age() method from MemoEntry as it was unused - Simplify test imports and remove related test refactor: update Python bindings with skip_from_py_object attribute - Add skip_from_py_object to all PyO3 class definitions - This optimizes Python object creation and prevents circular references - Affects Answer, Evidence, ReasoningTrace, Config, DocumentInfo, Concept - Affects Document, NodeInfo, MatchResult, FindResult, WordCount - Affects CollectedEvidence, TopicEntry, SectionSummary, TocEntry - Affects NodeStats, SimilarResult, SectionCard, DocCard, ConceptInfo - Affects Engine, VectorlessError, WeightedKeyword, EdgeEvidence - Affects GraphEdge, DocumentGraphNode, DocumentGraph, LlmMetricsReport - Affects RetrievalMetricsReport, and MetricsReport classes feat: introduce shared blackboard system for worker collaboration - Add Discovery and SharedBlackboard classes for inter-worker communication - Enable workers to share findings, leads, and cross-references - Provide formatted context views for individual workers - Implement discovery extraction from worker outputs feat: implement query reasoning pipeline replacing understanding - Replace QueryPlan with QueryAnalysis in dispatcher - Introduce QueryAnalyzer for multi-stage query analysis - Add reasoning types: Ambiguity, EntityRef, TemporalConstraint - Include RetrievalStrategy and QueryAnalysis components - Update dispatch function to use reasoning instead of understanding docs: update ask module documentation terminology - Change "query understanding" to "query reasoning" in docstrings - Reflect the shift from understanding to reasoning in comments
…val config - Move indexer, llm_pool, metrics, and storage modules from types/ directory to src/ directory root - Remove retrieval module as it's no longer needed in core configuration - Update lib.rs to export modules directly instead of through types mod - Add comprehensive Config struct with validation capabilities - Include ConfigValidationError and ValidationError types for proper configuration validation - Add tests for configuration defaults and validation feat(blackboard): enhance discovery extraction with cross-document refs - Extract cross-references from evidence content using regex patterns - Add "cross_ref" and "lead" discovery types for document references - Track evidence-referenced documents and generate lead discoveries refactor(analyzer): consolidate JSON parsing utilities - Move _parse_json_response function to shared utils module - Import parse_json_response from vectorless.ask.utils - Update QueryAnalyzer to use consolidated JSON parsing utility feat(analyzer): add analysis completion tracking - Add analysis_complete flag to QueryAnalysis to track whether deep analysis stages completed successfully - Set analysis_complete=False when deep analysis stages fail - Propagate completion status through re_analyze method refactor(python): remove deprecated retrieval configuration - Remove set_top_k and set_max_iterations methods from PyConfig - Update Config documentation to reflect removal of retrieval params - Remove RetrievalConfig from python exports refactor(utils): centralize JSON response parsing logic - Create parse_json_response utility function in utils module - Consolidate JSON parsing logic from analyzer and verifier modules - Handle markdown-wrapped JSON and extract JSON blocks properly feat(verify): improve evidence reference formatting and scoring - Update verify prompt to use "doc_name/node_title" format for evidence references - Modify DimensionScore to accept both "doc_name/node_title" and "node_title" formats - Calculate overall confidence from dimension scores instead of relying on LLM self-assessment
…e module - Remove SufficiencyConfig struct and related default functions - Remove CacheConfig struct and related default functions - Remove StrategyConfig and all related strategy configuration structs - Update module documentation to reflect removal of sufficiency types - Clean up tests by removing sufficiency and strategy config tests - Keep only storage-related configuration types in the module
- Add explicit ValueError documentation in docstring when JSON parsing fails - Implement proper exception handling for JSONDecodeError - Wrap JSON parsing in try-catch block and raise descriptive ValueError - Include original exception in the raised error using 'from e' syntax
Remove the ReferenceResolver struct that was caching resolved references for batch resolution. The implementation is no longer needed and has been removed from the codebase.
…ine module Remove the `ask` method documentation reference from the Engine module's main documentation block, along with unused imports for Answer, Evidence, and ReasoningTrace types that were only used by the removed ask functionality. BREAKING CHANGE: The ask method has been removed from the Engine API.
- Remove unused FieldKey and Field structs from bm25 module - Remove entire memory backend implementation including tests - Remove memory backend from storage backend module exports - Remove example documentation from storage lib.rs - Remove unused file system imports from persistence module - Remove PersistenceOptions struct and related configuration methods - Remove file-based save/load functions with atomic write logic - Remove index save/load functions that used file operations - Update tests to use bytes-based serialization instead of file-based - Simplify checksum verification tests to work with byte arrays
Removed the unused test module from workspace.rs that contained a helper function for creating test documents which was no longer being used in the codebase.
BREAKING CHANGE: drop support for Python 3.10 and require Python >= 3.11 - Update pyproject.toml to require Python >= 3.11 - Remove Python 3.10 classifier from package metadata - Remove conditional tomli dependency since tomllib is built-in from 3.11+ - Update mypy and ruff configurations to target Python 3.11 - Simplify TOML loading code by removing Python version checks refactor: improve type safety with protocol-based typing - Replace loose `Any` and `Callable` types with structured protocols - Add DocLoader and EventCallback protocols for better type checking - Update dispatcher and orchestrator to use typed parameters - Remove dynamic imports for tomllib in favor of built-in module refactor: modernize optional type annotations - Convert Optional[T] to T | None union syntax throughout codebase - Update engine class constructors and methods with modern typing - Standardize nullability patterns across all modules perf: replace asyncio.gather with TaskGroup for better error handling - Use TaskGroup instead of gather(return_exceptions=True) for worker tasks - Maintain same fault-tolerance while improving async execution - Update batch compilation to use TaskGroup for better resource management
- Change project description from "Document Understanding Engine for AI" to "Knowing by reasoning, not vectors." in both Cargo.toml and pyproject.toml - Update Python version requirement from 3.9+ to 3.11+ in installation docs
BREAKING CHANGE: Removed QueryResult, QueryResultItem, QueryMetrics, EvidenceItem, and Confidence types from vectorless-engine as query logic has been moved to Python layer. Also removed QueryEvent enum and related functionality from events module since query handling is now managed externally.
…nderstanding module BREAKING CHANGE: Removed Answer, Evidence, ReasoningTrace, and TraceStep types from the understanding module as they were no longer used. Also removed SufficiencyLevel from format exports. - Remove Answer, Evidence, ReasoningTrace, TraceStep from understanding exports - Remove SufficiencyLevel from format exports - Clean up related documentation comments
…functions BREAKING CHANGE: Remove SufficiencyLevel enum from vectorless-document crate and consolidate keyword extraction and evidence formatting utilities into shared module. - Remove SufficiencyLevel enum from format.rs as it's no longer used - Move extract_keywords function to vectorless/ask/utils.py as single source of truth - Move format_evidence function to vectorless/ask/utils.py as single source of truth - Replace in-memory response cache with bounded LRU cache in LLMClient - Add structured error types for ask pipeline operations - Remove Answer-related Python bindings that were unused
- rename "vectorless-index" crate to "vectorless-compiler" - update IndexMode enum to SourceFormat - rename IndexInput to CompilerInput and PipelineResult to CompileResult - update IndexContext to CompileContext and related stage names - rename IndexStage trait to CompileStage across all modules - update documentation to reflect document compilation instead of indexing
- Introduce extract_llm_insights function in blackboard module to identify findings relevant to other documents using LLM analysis - Add new import and export for extract_llm_insights in ask module - Integrate LLM insight extraction in orchestrator for multi-document scenarios with additional error handling - Include comprehensive docstring explaining the functionality and cost implications
- Rename `stages` module to `passes` across the compiler - Update all stage-related structs to use pass terminology: - `EnhanceStage` → `EnhancePass` - `ValidateStage` → `ValidatePass` - `ConceptExtractionStage` → `ConceptPass` - `NavigationCompileStage` → `NavigationPass` - `OptimizeStage` → `OptimizePass` - `ReasoningCompileStage` → `ReasoningPass` - `VerifyStage` → `VerifyPass` - `BuildStage` → `BuildPass` - `ParseStage` → `ParsePass` - Update trait implementations from `CompileStage` to `CompilePass` - Change result types from `StageResult` to `PassResult` - Restructure module organization with frontend, analysis, and backend passes - Update import paths to use new `crate::passes` module structure - Fix all test references to use new pass naming convention
…ategories - Replace "Priority" labels with semantic phase categories (Frontend, Analysis, Transform, Backend) to better reflect the compilation pipeline stages - Update descriptions for clarity: change "Tree integrity checks (optional)" to "Tree integrity checks", remove "(optional)" from various stages as they are conditionally executed based on pipeline configuration rather than being truly optional - Add new "Concept" stage at priority 47 between "Reasoning Idx" and "Navigation Idx" phases - Rename "Symbol table (keyword→path mapping)" for "Reasoning Idx" stage and "Debug info for runtime navigation" for "Navigation Idx" stage - Add "Output validation" stage at priority 55 between "Reasoning Idx" and "Optimize" phases - Update checkpointing description from "stage group" to "pass group" for more accurate terminology
- Add overview documentation explaining the compiler architecture and phase breakdown - Document pipeline infrastructure including CompilePass trait, PipelineExecutor, and PipelineOrchestrator components - Detail all compilation passes with their priorities, dependencies, and functionality - Provide configuration guide for PipelineOptions and related types - Explain incremental compilation mechanism with change detection - Document checkpoint and resume functionality for pipeline recovery feat(compiler): implement RoutePass for query routing table generation - Build intent routes from nodes with question hints for Agent acceleration - Create concept routes from topic tags to enable semantic navigation - Calculate relevance scores based on content richness and hint count - Limit route targets to improve performance and reduce memory usage refactor(compiler): extend CompileContext with agent acceleration data - Add query_routes field for pre-computed routing table storage - Include chain_index for reasoning chain navigation - Add content_overlap map to prevent duplicate content visits - Introduce evidence_scores for per-node quality assessment - Update context cloning and result extraction methods accordingly
…ndalone usage - Add documentation page for writing custom passes with implementation examples - Document the parsers module and RawNode structure for document parsing - Create standalone usage guide for vectorless-compiler crate - Update sidebar configuration to include new documentation pages feat(compiler): implement backend analysis passes for chain, overlap, and scoring - Add ChainPass to build reasoning chain index from document references - Implement OverlapPass to detect content overlap between leaf nodes using Jaccard similarity - Create ScorePass to compute evidence quality scores based on density, data richness, and specificity - Register new passes in the pipeline executor with appropriate priorities - Update module exports to make new passes available
…hains, overlap detection and scoring - Add RoutePass for building query routing tables with intent and concept routes - Add ChainPass for creating reasoning chain indexes from document references - Add OverlapPass for detecting content overlap with Jaccard similarity algorithm - Add ScorePass for evidence quality scoring using density, data richness and specificity - Update pipeline executor to include new backend stages at priorities 52-58 - Add comprehensive unit tests for each new pass covering edge cases and end-to-end scenarios - Update documentation diagrams to show new backend components (Route, Chain, Overlap, Score) - Add metrics recording for each new pass including timing and count statistics - Update validation pass to track new output flags - Export NodeReference type for external usage
- Add RoutePass for pre-computed query routing table to accelerate agent-based queries - Add ChainPass for building reasoning chain index from in-document cross-references - Add OverlapPass for detecting content overlap between leaf nodes using Jaccard similarity - Add ScorePass for computing per-node evidence quality scores based on density, richness, and specificity metrics Update documentation to reflect 15 passes instead of 1 in the pipeline, including detailed descriptions of new passes, their dependencies, and data flow diagrams. Modify ChainPass implementation to use proper RefType enum instead of string matching for reference classification.
BREAKING CHANGE: Renamed IndexMetrics to CompileMetrics and IndexedDocument to CompiledDocument throughout the codebase. - Updated documentation to reflect compile pipeline terminology - Changed metric type from IndexMetrics to CompileMetrics - Renamed internal document type from IndexedDocument to CompiledDocument - Added new agent acceleration data fields to compiled document - Updated schema version from 1 to 2 due to structural changes - Modified persistence layer to include new index types
…stomStageBuilder BREAKING CHANGE: Removed deprecated StageResult type alias that was marked for removal since version 0.2.0. Also removed CustomStageBuilder struct which was unused in the codebase. These changes clean up the API surface and remove dead code.
Add pre-computed agent acceleration data structures to the Document type including query routing tables, reasoning chain indices, content overlap maps, and evidence quality scores. Update documentation to reflect compilation terminology instead of ingestion terminology. BREAKING CHANGE: Document understanding terminology changed from ingestion to compilation process. feat(navigator): implement agent acceleration query methods Add new methods to DocumentNavigator for querying agent acceleration data including intent routes, concept routes, reasoning chains, content overlaps, and evidence scores. Include helper method for node ID conversion. refactor(python): expose agent acceleration APIs to Python bindings Expose new agent acceleration data structures and query methods through Python bindings. Add corresponding Python wrapper classes and async methods for all new functionality. feat(agent): utilize acceleration data for improved keyword hints Enhance agent keyword hint generation by incorporating pre-computed concept routes and evidence quality scores alongside traditional keyword index matches. Provide richer context for agent decision making.
…hanced phases - Rename "Index Pipeline" to "Compile Pipeline" to better reflect the compilation nature of the process - Replace stage-based terminology with phase-based structure (Frontend → Analysis → Transform → Backend) - Add detailed documentation for new backend passes including Route, Chain, Overlap, Score, and Verify - Document agent acceleration data and how it guides worker navigation - Update references from "indexing" to "compilation" throughout the architecture documentation feat(navigator): optimize concept routes lookup with early termination - Refactor concept_routes method to return early when no targets found - Limit results to one ConceptRouteInfo instead of collecting all matches - Improve performance by avoiding unnecessary collection operations fix(python): ensure error messages are properly converted to strings - Convert string literals to owned String objects in VectorlessError creation - Maintain consistency in error message handling across Python bindings - Prevent potential issues with string ownership in error contexts
- Replace old module path `vectorless::parser::markdown` with new path `vectorless_compiler::parse::markdown::config::MarkdownConfig` - Update examples to use proper crate structure and remove outdated configuration methods - Change rust code blocks to `rust,ignore` to prevent compilation errors
- Move pipeline module declaration after passes module - Reorder re-exported types from pipeline and config modules - Adjust import order for consistency across multiple files refactor(pipeline): reorder context field exports - Move ChainIndex before Concept in Document imports - Remove unnecessary blank lines in context definitions - Clean up unused imports in various pipeline modules refactor(passes): reorganize pass module structure - Move chain module before reasoning in backend - Reorder imports and module declarations consistently - Move parse module after build in frontend - Move split module after enrich in transform refactor(engine): clean up imports and declarations - Reorder compiler imports in engine module - Simplify import grouping in indexer module - Move engine module declaration position refactor(storage): remove redundant documentation comment style: format function calls and assertions with proper line breaks - Wrap long assertion statements in test cases - Format method chaining for better readability - Break down complex expressions across multiple lines
- **Compile pipeline**: renamed index pipeline to compile pipeline with passes-based architecture - **Compiler refactor**: renamed stages to passes, removed deprecated `StageResult` alias and `CustomStageBuilder` - New backend compilation passes: query routing, reasoning chains, overlap detection, and scoring - Agent acceleration data added to compiled documents - LLM-powered cross-document insight extraction in ask module - Enhanced JSON parsing with proper error handling - Upgraded minimum Python version to 3.11 - Removed unused modules: agent, memory backend, validation, ReferenceResolver, SufficiencyLevel - Restructured configuration modules and removed legacy retrieval config - Simplified storage layer by removing memory backend - Documentation updates for architecture and compilation pipeline
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Changes
Checklist
cargo build)cargo test --lib --all-features)cargo clippy --all-features)Notes