Skip to content

davex-ai/Archon

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🚀 ARCHON


🧠 What is this?

A system that analyzes any GitHub repository and generates deep technical interview questions + answers based on:

  • Architecture
  • Scalability
  • Tradeoffs
  • Real-world engineering decisions

⚡ Works even without LLM access using a fallback heuristic engine.


✨ Features

🔍 Repository Analysis

  • Fetches and parses GitHub repositories via API
  • Prioritizes important files (core logic > boilerplate)
  • Supports multiple languages

🧩 Intelligent Chunking

  • Breaks code into meaningful chunks
  • Filters noise (non-informative code)
  • Preserves structural context

🧠 Embedding + Retrieval (RAG)

  • Uses SentenceTransformers
  • Retrieves most relevant code sections
  • Builds contextual understanding of system design

🤖 AI Question Generation

  • Generates interview-level questions on:

    • Architecture decisions
    • Scalability concerns
    • Tradeoffs

⚡ Fallback Mode (No LLM Required)

  • Automatically switches to rule-based generation

  • Uses detected signals:

    • API usage
    • State management
    • Auth systems
    • Async logic

🧱 System Architecture

GitHub Repo
   ↓
File Fetching + Prioritization
   ↓
Chunking + Filtering
   ↓
Embeddings (SentenceTransformers)
   ↓
Vector Similarity Retrieval
   ↓
Context Builder
   ↓
AI Question Generator
   ↓
Fallback Engine (if LLM unavailable)

🛠️ Tech Stack

Backend:
- FastAPI
- Python

AI / ML:
- SentenceTransformers
- Cosine Similarity (Sklearn)

Data:
- GitHub REST API

Frontend:
- Minimal Web UI (React / HTML)

Optional:
- OpenAI API (LLM generation)

⚙️ How it Works

  1. Input a GitHub repo URL

  2. System fetches and filters key files

  3. Code is chunked and embedded

  4. Relevant chunks are retrieved

  5. Questions are generated using:

    • LLM (if available)
    • OR fallback heuristic engine

📡 API Usage

POST /analyze

{
  "repo_url": "https://github.com/user/repo",
  "num_questions": 5
}

Response

{
  "repo": "...",
  "mode": "mock",
  "questions": [
    {
      "id": 1,
      "question": "...",
      "answer": "..."
    }
  ]
}

⚠️ Challenges Solved

  • Large repo handling (chunking + prioritization)
  • Token limitations (retrieval instead of full context)
  • LLM dependency → solved with fallback system
  • Noise reduction in code analysis

💡 Future Improvements

  • 🔥 Dynamic repo-type detection (ML, backend, real-time, etc.)
  • 📊 Question difficulty levels (junior → senior)
  • 🔗 Follow-up interview questions
  • 🧠 Hybrid LLM + rule-based reasoning
  • ⚡ Caching + performance optimization

🧑‍💻 Author

Built by Dave — aspiring systems engineer ⚡


🎬 Demo


⭐ Support

If this project helped or inspired you:

  • ⭐ Star the repo
  • 🍴 Fork it
  • 🧠 Build something even crazier

“Don’t just read code. Interrogate it.”

About

A system that analyzes any GitHub repository and generates deep technical interview questions + answers

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors