Code analysis and search system that provides multi-language code parsing, fast text-based search using FAISS, real-time file monitoring, and AI-powered code analysis using Groq.
- Multi-language Code Parsing: Support for Python, JavaScript, TypeScript, Java, Go, Rust, C/C++, and more using tree-sitter
- Fast Text-based Search: FAISS-powered search for symbols, files, and dependencies
- Real-time Monitoring: Automatic incremental updates when code files change
- REST API: Comprehensive API for all indexing and search operations
- AI Analysis: Groq integration for intelligent code analysis, documentation generation, and bug detection
- Symbol Search: Find functions, classes, methods, variables, and constants
- File Search: Locate files by name, path, or content
- Dependency Search: Find imports, requires, and package usage
- Combined Search: Intelligent multi-type search with relevance ranking
- Regex Search: Pattern-based searching across symbols and files
- Code explanation and documentation generation
- Bug detection and security analysis
- Code complexity analysis and refactoring suggestions
- Performance optimization recommendations
- Best practices and code quality improvements
- Natural Language Queries: Ask questions about your codebase in plain English
- Context-Aware Explanations: Get detailed explanations with relevant code examples
- Pattern Recognition: Find similar code patterns across your project
- Architecture Analysis: Understand system design and component relationships
# Clone the repository
git clone <repository-url>
cd code-indexer
#create Virtual Environment
python -m venv .venv
## Activate venv
#Linux/Mac
source .venv/bin/activate
#Windows
.venv\Scripts\activate# Install dependencies using UV
uv pip install -r requirements.txtpip install -r requirements.txtCreate a .env file in the project root:
# Required for AI analysis
GROQ_API_KEY=your-groq-api-key
# Optional: For enhanced embeddings (if you want better semantic search)
OPENAI_API_KEY=your-openai-api-key
# Index configuration
CODE_INDEXER_INDEX_DIR=./index_data
CODE_INDEXER_CACHE_SIZE=10000
CODE_INDEXER_API_PORT=8000
DEBUG=falseNote: LlamaIndex will use Groq for LLM queries by default. OpenAI API key is optional but provides better text embeddings for more accurate semantic search.
# Start server
python main.py server
# Start with custom host/port
python main.py server --host 127.0.0.1 --port 8080 --reloadOnce the server is running, visit:
- Swagger UI:
http://localhost:8000/docs - ReDoc:
http://localhost:8000/redoc
POST /index- Index a repository/directoryGET /index/stats- Get indexing statisticsGET /index/status- Check indexing progressDELETE /index/file- Remove file from index
POST /search- General search with filtersGET /search/symbols- Search symbols with filtersGET /search/files- Search filesGET /search/dependencies- Search dependenciesGET /search/regex- Regex-based search
GET /symbol/{name}- Get symbol detailsGET /symbol/{name}/references- Find symbol referencesGET /symbol/{name}/hierarchy- Get symbol hierarchyGET /file/symbols- Get all symbols in a file
POST /watch/start- Start watching directoryPOST /watch/stop- Stop watching directoryGET /watch/status- Get watcher status
POST /analyze- General code analysis (Groq)POST /analyze/symbol/{name}/explain- Explain symbol (Groq)POST /analyze/symbol/{name}/improve- Suggest improvements (Groq)POST /analyze/file/bugs- Find bugs in file (Groq)POST /analyze/file/complexity- Analyze complexity (Groq)
POST /query- Natural language queries about codebasePOST /query/explain- Explain code concepts with contextPOST /query/patterns- Find similar code patternsPOST /query/architecture- Get architectural overviewPOST /query/best-practices- Find best practices examplesPOST /query/conversational- Conversational queries with historyPOST /query/complexity- Analyze code complexity patternsGET /llama/stats- Get LlamaIndex service statisticsPOST /llama/rebuild- Rebuild LlamaIndex from current data
code-indexer/
├── models/ # Pydantic data models
├── core/ # Core processing modules
│ ├── parser.py # Tree-sitter multi-language parser
│ ├── vectorstore.py # FAISS-based text search
│ ├── search.py # Advanced search engine
│ ├── incremental.py # File watching and updates
│ ├── groq_analyzer.py # AI-powered code analysis (Groq)
│ └── llama_index_service.py # LlamaIndex RAG wrapper
├── api/ # REST API endpoints
│ └── rest.py # FastAPI application
├── data/ # Your code repository
├── index_data/ # FAISS indices and metadata
├── config.py # Configuration management
└── main.py # CLI and server entry point
- Parsing: Tree-sitter extracts symbols, dependencies, and metadata
- Indexing: FAISS creates searchable text vectors for fast retrieval
- LlamaIndex Integration: Wraps FAISS indices for advanced RAG capabilities
- Storage: JSON-based persistence with LRU caching
- Monitoring: Watchdog monitors file changes for incremental updates
- Search: Multi-strategy search with relevance ranking
- RAG Analysis: LlamaIndex provides natural language queries and context-aware responses
- AI Analysis: Groq AI provides intelligent code insights
- Python: Functions, classes, methods, variables, imports
- JavaScript/TypeScript: Functions, classes, modules, exports
- Java: Classes, methods, interfaces, packages
- Go: Functions, types, interfaces, packages
- Rust: Functions, structs, traits, modules
- C/C++: Functions, classes, structs, includes
- Exact matching: Direct name matches
- Fuzzy matching: Similar names with typo tolerance
- Type filtering: Filter by function, class, variable, etc.
- Scope awareness: Understand parent-child relationships
- Path matching: Find files by path components
- Content analysis: Search based on symbols within files
- Language filtering: Filter by programming language
- Size and complexity hints: Prefer moderately-sized files
- Import tracking: Find all import/require statements
- External vs local: Distinguish between external packages and local modules
- Usage analysis: Show where dependencies are used
- Symbol explanation: Understand what functions/classes do
- Documentation generation: Create comprehensive docstrings
- Usage examples: Generate example usage code
- Bug detection: Find potential bugs and edge cases
- Security analysis: Identify security vulnerabilities
- Performance optimization: Suggest performance improvements
- Best practices: Recommend coding standards and patterns
- Conversational Search: Ask questions about your codebase in plain English
- Context-Aware Responses: Get answers with relevant code examples and explanations
- Pattern Discovery: Find similar implementations across your project
- Architecture Insights: Understand system design and component relationships
- Best Practices Finder: Discover good coding patterns in your codebase
- Complexity Analysis: Identify complex code that needs refactoring
# Natural language queries
python main.py query "How does the authentication system work?"
python main.py query "Show me examples of error handling patterns"
python main.py query "What functions are related to user management?"
# Code explanations
python main.py explain "What does the parse_file function do?"
# Pattern finding
python main.py patterns "async def function_name" --language python
# Architecture analysis
python main.py architecture "payment processing system"
# Best practices
python main.py best-practices "database connection handling"
# Complexity analysis
python main.py complexity --file src/models/user.py