Skip to content

krishpatel2-prog/Code-Archeologist

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

26 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🧠 AI Code Archeologist

Understand any codebase in minutes.

AI Code Archeologist is an architecture intelligence tool that analyzes software repositories and automatically uncovers system structure, module dependencies, risk hotspots, and safe refactoring zones.

Instead of manually exploring thousands of files, the system converts raw source code into structured architectural intelligence.

It combines static analysis, dependency graphs, and LLM reasoning to help developers quickly understand unfamiliar codebases.

⚠️ Currently supports Python repositories only.

new.1.1.mp4

✨ What This Tool Does

AI Code Archeologist helps engineers answer questions like:

  • πŸ” How is this project structured?
  • ⚠️ Which modules are risky to change?
  • πŸ”— What files depend on this module?
  • 🧱 What architecture pattern does this project follow?
  • πŸ›  Which parts are safe to refactor?

πŸš€ Core Features

πŸ“‚ Repository Analysis

Analyze both local repositories and GitHub repositories.

Supported inputs:

D:\projects\my_repo
https://github.com/user/repository

If a GitHub URL is provided, the system automatically clones and analyzes it.


🌳 File Structure Explorer

Navigate the repository using an interactive project tree.

Features include:

  • Collapsible folder hierarchy
  • File summaries
  • Module responsibilities
  • Risk indicators per file

This allows developers to quickly explore the structure of large repositories.


πŸ— Architecture Detection

The system automatically detects high-level architectural patterns.

Examples:

  • Layered Monolith
  • Modular Systems
  • Service-oriented architectures

Detected architectural layers may include:

  • Presentation Layer
  • Business Logic Layer
  • Domain Model Layer
  • Data Access Layer
  • Infrastructure Layer

πŸ”— Dependency Graph Analysis

Using Python's AST parser, the system builds a dependency graph of the codebase.

This graph reveals:

  • Module relationships
  • Dependency chains
  • System coupling patterns
  • Architectural bottlenecks

⚠️ Risk & Hotspot Detection

Identify modules that represent architectural risk.

Hotspots are detected using metrics such as:

  • Dependency centrality
  • Coupling levels
  • Impact radius

This helps developers locate areas where changes may cause cascading issues.


πŸ’₯ Impact Analysis

Predict the blast radius of code changes.

For any file, the system displays:

  • Direct dependents
  • Indirect dependents
  • Total impact radius
  • Structural risk signals

This helps developers understand how modifications propagate through the system.


πŸ›‘ Refactor Safe Zones

Highlights modules with low coupling and minimal dependencies.

These areas are safer to refactor without affecting other components.


πŸ“˜ Architecture Intelligence Wiki

The system generates a structured architecture wiki describing the repository.

The wiki includes:

  • Architecture overview
  • Module responsibilities
  • Architectural observations
  • Risk hotspots
  • Refactor candidates

πŸ€– Architecture-Aware Q&A

Ask questions about the project using the generated architecture intelligence.

Example questions:

  • Where is the core orchestration implemented?
  • Which modules interact with the API layer?
  • What files should be refactored first?
  • Which components are most critical?

Responses are generated using contextual data extracted from the repository.


🧩 System Architecture

The project consists of three main components.

πŸ” Static Analysis Engine

Responsible for parsing code and extracting structural data.

Processes include:

  • Repository scanning
  • AST parsing
  • Dependency graph construction
  • Structural metric analysis

🧠 Architecture Intelligence Layer

Transforms raw analysis data into architectural insights.

Generates:

  • Architecture summaries
  • Hotspot detection
  • Impact predictions
  • Refactor recommendations

πŸ–₯ Intelligence Interface

A modern UI dashboard used to explore repository intelligence.

Includes:

  • File structure explorer
  • Architecture visualization
  • Hotspot analysis
  • Impact analysis
  • Architecture Q&A assistant

🧰 Technology Stack

Backend

  • 🐍 Python
  • ⚑ FastAPI
  • 🧠 LangGraph
  • 🌳 AST Parsing
  • πŸ”— Graph-based dependency analysis
  • πŸ€– Groq LLM API

Frontend

  • βš›οΈ React
  • πŸ“˜ TypeScript
  • 🎨 Modern dashboard UI
  • 🌳 Interactive repository explorer
  • πŸ“Š Architecture visualization

βš™οΈ Analysis Pipeline

When a repository is analyzed, the system performs the following steps:

Repository Input
      ↓
Repository Scanning
      ↓
Python File Extraction
      ↓
AST Parsing
      ↓
Dependency Graph Construction
      ↓
Graph Analysis (hotspots, cycles, leaf nodes)
      ↓
Architecture Detection
      ↓
Impact Analysis Engine
      ↓
Architecture Wiki Generation
      ↓
Interactive Intelligence Dashboard

πŸ“ Project Structure

backend
β”‚
β”œβ”€β”€ analysis
β”‚   β”œβ”€β”€ ast_parser.py
β”‚   β”œβ”€β”€ dependency_graph.py
β”‚   β”œβ”€β”€ file_prioritizer.py
β”‚   β”œβ”€β”€ impact.py
β”‚   └── risk_engine.py
β”‚
β”œβ”€β”€ core
β”‚   β”œβ”€β”€ langgraph_flow.py
β”‚   β”œβ”€β”€ state.py
β”‚   └── wiki_builder.py
β”‚
β”œβ”€β”€ ingestion
β”‚   β”œβ”€β”€ file_scanner.py
β”‚   β”œβ”€β”€ repo_loader.py
β”‚   └── github_loader.py
β”‚
β”œβ”€β”€ llm
β”‚   β”œβ”€β”€ architecture.py
β”‚   β”œβ”€β”€ summarizer.py
β”‚   └── impact_explainer.py
β”‚
└── api
    └── routes.py


frontend
β”‚
β”œβ”€β”€ components
β”œβ”€β”€ pages
β”œβ”€β”€ services
└── ui

βš™οΈ Installation

1️⃣ Clone the Repository

git clone https://github.com/yourusername/ai-code-archeologist
cd ai-code-archeologist

2️⃣ Backend Setup

cd backend
pip install -r requirements.txt

Create .env file:

GROQ_API_KEY=your_key_here

Run backend:

uvicorn main:app --reload

3️⃣ Frontend Setup

cd frontend
npm install
npm run dev

πŸ§ͺ Example Usage

Analyze a local repository:

D:\projects\my_repo

Analyze a GitHub repository:

https://github.com/langchain-ai/langgraph

The system will:

  1. Clone the repository (if GitHub URL)
  2. Scan source files
  3. Build dependency graph
  4. Detect architecture
  5. Generate architecture wiki
  6. Display analysis dashboard

πŸ“ˆ Future Improvements

Planned enhancements include:

  • 🌐 Multi-language support (Java, Go, TypeScript)
  • πŸ“Š Interactive dependency graph visualization
  • πŸ” Architectural drift detection
  • πŸ§ͺ Pull request impact prediction
  • πŸ“‰ Code complexity metrics
  • πŸ“š Exportable architecture documentation

🎯 Motivation

Understanding unfamiliar codebases is one of the most difficult tasks engineers face.

AI Code Archeologist was created to transform raw source code into structured architectural knowledge, enabling developers to quickly understand:

  • System structure
  • Dependency relationships
  • Architectural risks
  • Safe modification zones

πŸ“œ License

MIT License

About

AI Code Archeologist is an architecture intelligence tool that analyzes software repositories and automatically uncovers system structure, module dependencies, risk hotspots, and safe refactoring zones.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors