AI Code Archeologist is an architecture intelligence tool that analyzes software repositories and automatically uncovers system structure, module dependencies, risk hotspots, and safe refactoring zones.
Instead of manually exploring thousands of files, the system converts raw source code into structured architectural intelligence.
It combines static analysis, dependency graphs, and LLM reasoning to help developers quickly understand unfamiliar codebases.
new.1.1.mp4
AI Code Archeologist helps engineers answer questions like:
- π How is this project structured?
β οΈ Which modules are risky to change?- π What files depend on this module?
- π§± What architecture pattern does this project follow?
- π Which parts are safe to refactor?
Analyze both local repositories and GitHub repositories.
Supported inputs:
D:\projects\my_repo
https://github.com/user/repository
If a GitHub URL is provided, the system automatically clones and analyzes it.
Navigate the repository using an interactive project tree.
Features include:
- Collapsible folder hierarchy
- File summaries
- Module responsibilities
- Risk indicators per file
This allows developers to quickly explore the structure of large repositories.
The system automatically detects high-level architectural patterns.
Examples:
- Layered Monolith
- Modular Systems
- Service-oriented architectures
Detected architectural layers may include:
- Presentation Layer
- Business Logic Layer
- Domain Model Layer
- Data Access Layer
- Infrastructure Layer
Using Python's AST parser, the system builds a dependency graph of the codebase.
This graph reveals:
- Module relationships
- Dependency chains
- System coupling patterns
- Architectural bottlenecks
Identify modules that represent architectural risk.
Hotspots are detected using metrics such as:
- Dependency centrality
- Coupling levels
- Impact radius
This helps developers locate areas where changes may cause cascading issues.
Predict the blast radius of code changes.
For any file, the system displays:
- Direct dependents
- Indirect dependents
- Total impact radius
- Structural risk signals
This helps developers understand how modifications propagate through the system.
Highlights modules with low coupling and minimal dependencies.
These areas are safer to refactor without affecting other components.
The system generates a structured architecture wiki describing the repository.
The wiki includes:
- Architecture overview
- Module responsibilities
- Architectural observations
- Risk hotspots
- Refactor candidates
Ask questions about the project using the generated architecture intelligence.
Example questions:
- Where is the core orchestration implemented?
- Which modules interact with the API layer?
- What files should be refactored first?
- Which components are most critical?
Responses are generated using contextual data extracted from the repository.
The project consists of three main components.
Responsible for parsing code and extracting structural data.
Processes include:
- Repository scanning
- AST parsing
- Dependency graph construction
- Structural metric analysis
Transforms raw analysis data into architectural insights.
Generates:
- Architecture summaries
- Hotspot detection
- Impact predictions
- Refactor recommendations
A modern UI dashboard used to explore repository intelligence.
Includes:
- File structure explorer
- Architecture visualization
- Hotspot analysis
- Impact analysis
- Architecture Q&A assistant
- π Python
- β‘ FastAPI
- π§ LangGraph
- π³ AST Parsing
- π Graph-based dependency analysis
- π€ Groq LLM API
- βοΈ React
- π TypeScript
- π¨ Modern dashboard UI
- π³ Interactive repository explorer
- π Architecture visualization
When a repository is analyzed, the system performs the following steps:
Repository Input
β
Repository Scanning
β
Python File Extraction
β
AST Parsing
β
Dependency Graph Construction
β
Graph Analysis (hotspots, cycles, leaf nodes)
β
Architecture Detection
β
Impact Analysis Engine
β
Architecture Wiki Generation
β
Interactive Intelligence Dashboard
backend
β
βββ analysis
β βββ ast_parser.py
β βββ dependency_graph.py
β βββ file_prioritizer.py
β βββ impact.py
β βββ risk_engine.py
β
βββ core
β βββ langgraph_flow.py
β βββ state.py
β βββ wiki_builder.py
β
βββ ingestion
β βββ file_scanner.py
β βββ repo_loader.py
β βββ github_loader.py
β
βββ llm
β βββ architecture.py
β βββ summarizer.py
β βββ impact_explainer.py
β
βββ api
βββ routes.py
frontend
β
βββ components
βββ pages
βββ services
βββ ui
git clone https://github.com/yourusername/ai-code-archeologist
cd ai-code-archeologistcd backend
pip install -r requirements.txtCreate .env file:
GROQ_API_KEY=your_key_here
Run backend:
uvicorn main:app --reloadcd frontend
npm install
npm run devAnalyze a local repository:
D:\projects\my_repo
Analyze a GitHub repository:
https://github.com/langchain-ai/langgraph
The system will:
- Clone the repository (if GitHub URL)
- Scan source files
- Build dependency graph
- Detect architecture
- Generate architecture wiki
- Display analysis dashboard
Planned enhancements include:
- π Multi-language support (Java, Go, TypeScript)
- π Interactive dependency graph visualization
- π Architectural drift detection
- π§ͺ Pull request impact prediction
- π Code complexity metrics
- π Exportable architecture documentation
Understanding unfamiliar codebases is one of the most difficult tasks engineers face.
AI Code Archeologist was created to transform raw source code into structured architectural knowledge, enabling developers to quickly understand:
- System structure
- Dependency relationships
- Architectural risks
- Safe modification zones
MIT License