Skip to content

ShauryaRattan/Execra

 
 
Execra Banner

Stars Forks Issues License Pull Requests GSSoC 2026 Status



"Don't learn to do it — just do it correctly, right now."

Execra is not a chatbot. Not a tutorial. Not a coding assistant. It is your real-time execution partner — observing, understanding, and guiding every action you take, before mistakes happen.


GirlScript Summer of Code   Open Source   PRs Welcome


📑 Table of Contents

Click to expand / collapse

🌟 What is Execra?

Execra (Execution + Era) is a multimodal AI-powered Universal Execution Intelligence Layer — a continuously running background system that observes your actions in real time across both digital environments (coding, software) and physical environments (real-world tasks via camera), and actively guides you through correct execution before mistakes happen.

Unlike a chatbot that answers only when asked, Execra acts like an expert sitting beside you, watching your every step and speaking up the moment it predicts an error, inefficiency, or risk.

Traditional Workflow:        Execra Workflow:
┌──────────────────┐         ┌─────────────────────────────────┐
│  Search → Learn  │         │  Start Task → Execra Guides You │
│  → Practice      │   VS    │  in Real-Time → Execute         │
│  → Fail → Retry  │         │  Correctly → Done               │
└──────────────────┘         └─────────────────────────────────┘

🎯 Core Objective

╔══════════════════════════════════════════════════════════════════╗
║                                                                  ║
║   Build an AI that does NOT wait for prompts.                    ║
║                                                                  ║
║   It OBSERVES → UNDERSTANDS → GUIDES → CORRECTS                  ║
║   continuously, in real time, without user re-explanation.       ║
║                                                                  ║
╚══════════════════════════════════════════════════════════════════╝

🔥 The Problem We Solve

Pain Point Current Reality Execra's Solution
🔍 Searching Stop work → Google → read docs Guidance appears in-context, zero search
📚 Learning Curve Spend hours learning before doing Do directly with guided steps
Trial & Error Make mistakes, debug, retry Errors predicted before they happen
🤖 Generic AI Copy-pasted answers without context Understands exactly what you are doing
📷 Physical Tasks No AI help for real-world work Camera-based real-world guidance
🔁 Re-explaining AI Repeat yourself each session Remembers full context of current session

✨ Core Capabilities

👁️ 1. Multimodal Perception

  • 🖥️ Screen capture (code, software UI)
  • 📷 Camera feed (real-world tasks)
  • 🔤 OCR — text recognition
  • 🧩 Object detection & UI understanding
  • ⚡ Continuous action tracking

🧭 2. Context & Intent Understanding

  • 📌 Auto-detect task type (no prompt needed)
  • 🎯 Infer user goal from observation
  • 📋 Track current step in workflow
  • 🔄 Maintain dynamic session context model

⚙️ 3. Execution Intelligence

  • 📊 Decompose tasks into ordered steps
  • 🔴 Real-time error detection
  • 📡 Adapt instructions to user progress
  • 🔮 Predict consequences before action

💻 4. Coding System (Digital)

  • 🪲 Runtime execution tracing
  • 🔍 Logical error identification
  • 💡 Explain errors from actual behavior (not just static analysis)
  • 🏗️ Convert high-level goals → structured dev steps

🏠 5. Offline System (Physical)

  • 🔧 Detect objects & tools via camera
  • 🍳 Guide cooking, repairs, form filling
  • 🚨 Intervene before incorrect actions
  • 📍 Recognize task type from visual input

🛡️ 6. Trust & Authenticity Layer

  • 📊 Confidence score on every instruction
  • 🔍 Reasoning & explanation per suggestion
  • 🔁 Multi-source validation (rules + model + data)
  • ⚠️ Uncertainty flagging
  • 🐢 Safe Mode | ⚡ Expert Mode

🏗️ System Architecture

High-Level Architecture Diagram

╔══════════════════════════════════════════════════════════════════════════╗
║                         E X E C R A   SYSTEM                            ║
╠══════════════════════════════════════════════════════════════════════════╣
║                                                                          ║
║   ┌─────────────────────────────────────────────────────────────────┐   ║
║   │                        INPUT LAYER                              │   ║
║   │                                                                 │   ║
║   │   ┌──────────────┐   ┌──────────────┐   ┌──────────────────┐   │   ║
║   │   │ Screen Capture│   │  Camera Feed │   │  User Text Input │   │   ║
║   │   │  (Digital AI) │   │ (Physical AI)│   │  (Active Mode)   │   │   ║
║   │   └──────┬───────┘   └──────┬───────┘   └────────┬─────────┘   │   ║
║   └──────────┼──────────────────┼────────────────────┼─────────────┘   ║
║              │                  │                     │                  ║
║              ▼                  ▼                     ▼                  ║
║   ┌─────────────────────────────────────────────────────────────────┐   ║
║   │                      PROCESSING LAYER                           │   ║
║   │                                                                 │   ║
║   │ ┌───────────────┐  ┌──────────────────┐  ┌──────────────────┐  │   ║
║   │ │  Code Runtime  │  │ Computer Vision   │  │  Context Engine  │  │   ║
║   │ │  Trace Engine  │  │ (OCR + Detection) │  │ (Task Detector)  │  │   ║
║   │ └───────┬───────┘  └────────┬─────────┘  └────────┬─────────┘  │   ║
║   └─────────┼───────────────────┼────────────────────┼─────────────┘   ║
║             │                   │                     │                  ║
║             ▼                   ▼                     ▼                  ║
║   ┌─────────────────────────────────────────────────────────────────┐   ║
║   │                     INTELLIGENCE LAYER                          │   ║
║   │                                                                 │   ║
║   │ ┌──────────────┐  ┌──────────────────┐  ┌──────────────────┐   │   ║
║   │ │     LLM      │  │  Rule-Based       │  │  Prediction       │   │   ║
║   │ │  (Reasoning) │  │  Validator        │  │  & Simulation     │   │   ║
║   │ └──────┬───────┘  └────────┬─────────┘  └────────┬─────────┘   │   ║
║   │        │                   │                      │              │   ║
║   │        └───────────────────┴──────────────────────┘             │   ║
║   │                            │                                    │   ║
║   │              ┌─────────────▼──────────────┐                    │   ║
║   │              │   TRUST & CONFIDENCE SCORER │                    │   ║
║   │              │   (Score + Explanation)      │                    │   ║
║   │              └─────────────────────────────┘                    │   ║
║   └─────────────────────────────────────────────────────────────────┘   ║
║                                │                                         ║
║                                ▼                                         ║
║   ┌─────────────────────────────────────────────────────────────────┐   ║
║   │                        OUTPUT LAYER                             │   ║
║   │                                                                 │   ║
║   │   ┌────────────┐  ┌──────────────┐  ┌────────────────────────┐ │   ║
║   │   │  Real-Time │  │ Error Alerts │  │  Confidence Indicators  │ │   ║
║   │   │ Instruction│  │  & Warnings  │  │  + Reasoning Display   │ │   ║
║   │   └────────────┘  └──────────────┘  └────────────────────────┘ │   ║
║   └─────────────────────────────────────────────────────────────────┘   ║
╚══════════════════════════════════════════════════════════════════════════╝

Subsystem Communication Flow

                          USER ACTION
                              │
                    ┌─────────▼─────────┐
                    │   Perception Bus   │◄──────────────────────────┐
                    │ (Screen + Camera)  │                           │
                    └─────────┬─────────┘                           │
                              │                                      │
              ┌───────────────┼───────────────┐                     │
              │               │               │                     │
    ┌─────────▼────┐  ┌───────▼──────┐  ┌────▼──────────┐          │
    │  Code Engine │  │  CV Engine   │  │ Intent Engine │          │
    │  (Digital)   │  │  (Physical)  │  │ (Context)     │          │
    └─────────┬────┘  └───────┬──────┘  └────┬──────────┘          │
              │               │               │                     │
              └───────────────▼───────────────┘                     │
                              │                                      │
                    ┌─────────▼─────────┐                           │
                    │  Intelligence Core │                           │
                    │  (LLM + Rules +   │                           │
                    │   Prediction)     │                           │
                    └─────────┬─────────┘                           │
                              │                                      │
                    ┌─────────▼─────────┐                           │
                    │   Trust Scorer    │                           │
                    │ Confidence > 80%? │                           │
                    └──┬────────────┬───┘                           │
                       │            │                               │
               Yes ────┘            └──── No                       │
                  │                          │                      │
     ┌────────────▼──────────┐  ┌────────────▼─────────────┐       │
     │  Deliver Instruction  │  │  Flag Uncertainty +       │       │
     │  + Confidence Score   │  │  Request Clarification    │       │
     └────────────┬──────────┘  └────────────┬─────────────┘       │
                  │                           │                     │
                  └─────────────┬─────────────┘                     │
                                │                                   │
                      ┌─────────▼─────────┐                        │
                      │   Action Logger   │────────────────────────┘
                      │  (Undo / Replay)  │   Feedback loop
                      └───────────────────┘

Dual-Domain Architecture (Digital + Physical)

┌─────────────────────────────────────────────────────────────────┐
│                        EXECRA CORE                              │
│                                                                 │
│  ┌─────────────────────────┐   ┌─────────────────────────────┐  │
│  │   DIGITAL DOMAIN (IDE)  │   │   PHYSICAL DOMAIN (Camera)  │  │
│  │                         │   │                             │  │
│  │  📺 Screen Capture      │   │  📷 Live Camera Feed        │  │
│  │  🔤 Code Parser         │   │  🔍 Object Detection        │  │
│  │  ⚙️  Runtime Tracer     │   │  📐 Spatial Analysis        │  │
│  │  🐞 Logic Debugger      │   │  🏷️  OCR (Text in Scene)    │  │
│  │  📈 Execution Flow Map  │   │  🔄 Action Recognition      │  │
│  │                         │   │                             │  │
│  │  Examples:              │   │  Examples:                  │  │
│  │  • Code debugging       │   │  • Hardware repair          │  │
│  │  • Form completion      │   │  • Cooking guidance         │  │
│  │  • Software navigation  │   │  • Physical form filling    │  │
│  │  • API integration      │   │  • Device assembly          │  │
│  └────────────┬────────────┘   └──────────────┬──────────────┘  │
│               │                               │                  │
│               └───────────────┬───────────────┘                  │
│                               │                                  │
│                  ┌────────────▼────────────┐                     │
│                  │   UNIFIED CONTEXT MODEL  │                     │
│                  │   • Current Task State   │                     │
│                  │   • Step Tracker         │                     │
│                  │   • Error History        │                     │
│                  │   • User Profile         │                     │
│                  └─────────────────────────┘                     │
└─────────────────────────────────────────────────────────────────┘

🔄 User Workflow

┌─────────────────────────────────────────────────────────────────┐
│                    EXECRA USER JOURNEY                          │
└─────────────────────────────────────────────────────────────────┘

    ①                         ②                         ③
┌─────────┐             ┌──────────────┐           ┌───────────────┐
│  User   │  ────────►  │   Execra     │  ──────►  │  Task Model   │
│ Starts  │             │  Detects     │           │  Built        │
│  Task   │             │  Context     │           │  Internally   │
└─────────┘             └──────────────┘           └───────────────┘
                                                          │
    ⑧                         ⑦                          ④
┌─────────┐             ┌──────────────┐           ┌─────▼─────────┐
│ Task    │  ◄────────  │  Adapts to   │  ◄──────  │  Step-by-Step │
│Complete │             │  Progress    │           │  Guidance     │
│ ✅      │             │  Dynamically │           │  Begins       │
└─────────┘             └──────────────┘           └───────────────┘
                                │
    ⑨                          ⑤                         ⑥
┌─────────┐             ┌──────────────┐           ┌───────────────┐
│  User   │  ────────►  │  Execution   │  ──────►  │  Errors       │
│ can ask │             │  Monitored   │           │  Detected &   │
│  Text Q │             │  Continuously│           │  Consequences │
└─────────┘             └──────────────┘           │  Simulated    │
                                                   └───────────────┘

Step-by-Step Execution Detail

Step What Happens Who Acts
1. Start User begins any task (opens editor, starts camera, opens form) User
2. Detection Execra auto-detects: task type, domain (digital/physical), current state Execra
3. Modeling Internal task model built: steps, dependencies, expected sequence Execra
4. Guidance Step-by-step instructions displayed in an overlay/panel Execra
5. Monitoring Every action tracked against expected behavior in real time Execra
6. Error Detection Deviations flagged; consequences simulated before commitment Execra
7. Adaptation Instructions updated dynamically based on user progress Execra
8. Completion Task completed with minimal trial-and-error Both
9. Active Mode At any time, user can type a question — context auto-remembered User + Execra

🧠 Intelligence Layers Explained

Layer 1 — Consequence Simulation Engine

BEFORE User Presses "Run" / "Submit" / Performs Action:

┌─────────────────────────────────────────────────────┐
│            CONSEQUENCE SIMULATOR                    │
│                                                     │
│  Current State  ──►  Possible Outcomes              │
│                                                     │
│  ✅ Outcome A: Code compiles, loop exits at n=10    │
│  ⚠️  Outcome B: Off-by-one error causes overflow    │
│  ❌ Outcome C: Infinite loop if condition missing   │
│                                                     │
│  Recommendation: Adjust line 14 condition           │
│  Confidence: 91% │ Source: Runtime Trace + Rules    │
└─────────────────────────────────────────────────────┘

Layer 2 — Trust & Confidence Scoring

Every instruction delivered by Execra includes:

┌──────────────────────────────────────────────────────┐
│  📋 INSTRUCTION: "Add null check before line 42"     │
│                                                      │
│  🔵 Confidence:  ████████░░  87%                     │
│  📚 Source:      LLM + Rule Engine + Execution Trace │
│  💬 Reasoning:   "Variable `config` returns None     │
│                   in 3 edge cases detected."         │
│  🔘 Mode:        [Safe Mode] / Expert Mode           │
└──────────────────────────────────────────────────────┘

Layer 3 — Hybrid Interaction System

                    ┌─────────────────────────┐
                    │  HYBRID INTERACTION      │
                    └────────────┬────────────┘
                                 │
           ┌─────────────────────┼─────────────────────┐
           │                     │                     │
  ┌────────▼───────┐  ┌──────────▼────────┐  ┌────────▼──────┐
  │  PASSIVE MODE  │  │   ACTIVE MODE     │  │  MIXED MODE   │
  │                │  │                  │  │               │
  │ Auto-observe   │  │ User asks text   │  │ Both modes    │
  │ Auto-guide     │  │ questions        │  │ simultaneously│
  │ No prompts     │  │ Context auto-    │  │               │
  │ needed         │  │ remembered       │  │               │
  └────────────────┘  └──────────────────┘  └───────────────┘

💻 Tech Stack

Layer Technology Purpose
👁️ Screen Capture PyAutoGUI, mss, Pillow Continuous screen recording & analysis
📷 Camera / CV OpenCV, YOLOv8, Tesseract OCR Real-world object detection & text reading
🧠 LLM Core OpenAI GPT-4o / Gemini 1.5 Pro / Llama 3 Reasoning, explanation, task decomposition
⚙️ Code Engine Python AST, sys.settrace, PyDebug Runtime tracing & execution flow analysis
🗂️ Context Engine LangChain, custom session manager Maintaining dynamic session context model
🔁 Rule Validator Drools / Python rule engine Deterministic validation alongside LLM
📊 Trust Scorer Custom scoring pipeline Confidence scoring per instruction
🖥️ Frontend / Overlay Electron.js / Tauri / Web Overlay Real-time guidance UI overlaid on screen
🔔 Notification Plyer / OS Notification APIs Proactive alerts & guidance delivery
💾 Storage SQLite / Redis (hot) + S3 (cold) Action history, undo stack, session logs
🐳 Deployment Docker, Kubernetes Scalable microservice deployment
🔗 API Layer FastAPI REST + WebSocket endpoints for real-time I/O

🚀 Getting Started

Prerequisites

# Python 3.10+
python --version

# Node.js 18+ (for overlay frontend)
node --version

# FFmpeg (for camera stream processing)
ffmpeg -version

Installation

# 1. Clone the repository
git clone https://github.com/sahoo-tech/execra.git
cd execra

# 2. Create virtual environment
python -m venv venv
source venv/bin/activate         # Linux/Mac
venv\Scripts\activate            # Windows

# 3. Install Python dependencies
pip install -r requirements.txt

# 4. Install frontend dependencies
cd frontend
npm install
cd ..

# 5. Set up environment variables
cp .env.example .env
# Edit .env and add your API keys (OpenAI / Gemini)

# 6. Download YOLO model weights
python scripts/download_models.py

# 7. Run Execra
python main.py

Quick Start (Docker)

# Build and run with Docker Compose
docker-compose up --build

# Execra will be running at:
# API:      http://localhost:8000
# Frontend: http://localhost:3000

📂 Project Structure

execra/
│
├── 📁 core/
│   ├── perception/
│   │   ├── screen_capture.py        # Screen capture engine
│   │   ├── camera_feed.py           # Camera input handler
│   │   └── ocr_engine.py            # Text recognition (Tesseract)
│   │
│   ├── intelligence/
│   │   ├── llm_client.py            # LLM abstraction layer
│   │   ├── context_engine.py        # Session context manager
│   │   ├── consequence_sim.py       # Outcome prediction engine
│   │   └── trust_scorer.py          # Confidence scoring
│   │
│   ├── digital/
│   │   ├── code_tracer.py           # Runtime execution tracer
│   │   ├── error_detector.py        # Logical error identification
│   │   └── task_decomposer.py       # Goal → Step converter
│   │
│   ├── physical/
│   │   ├── object_detector.py       # YOLO-based detection
│   │   ├── task_recognizer.py       # Physical task classifier
│   │   └── action_validator.py      # Real-world action checker
│   │
│   └── hybrid/
│       ├── mode_manager.py          # Passive/Active mode switcher
│       ├── action_logger.py         # Undo/Recovery stack
│       └── guidance_dispatcher.py  # Instruction delivery
│
├── 📁 frontend/
│   ├── overlay/                     # Desktop overlay UI
│   ├── panel/                       # Main guidance panel
│   └── components/                  # Reusable UI components
│
├── 📁 api/
│   ├── main.py                      # FastAPI application
│   ├── routes/                      # API endpoints
│   └── websockets/                  # Real-time WebSocket handlers
│
├── 📁 models/
│   ├── yolo/                        # Object detection weights
│   └── custom/                      # Domain-specific classifiers
│
├── 📁 tests/
│   ├── unit/
│   ├── integration/
│   └── e2e/
│
├── 📁 docs/
│   ├── architecture.md
│   ├── api_reference.md
│   └── contributing_guide.md
│
├── 📁 scripts/
│   └── download_models.py
│
├── docker-compose.yml
├── requirements.txt
├── .env.example
└── main.py

🤝 Contributing (GSSoC 2026)

🎉 Welcome, GirlScript Summer of Code 2026 Contributors! 🎉

GSSoC 2026

We're thrilled to have you here! Execra is an open project built for and by the community. Whether you're a beginner or an expert, there's a place for you.


🛣️ Contribution Roadmap

                    YOUR CONTRIBUTION JOURNEY

    ┌─────────┐    ┌─────────┐    ┌─────────┐    ┌─────────┐
    │  FIND   │───►│  FORK   │───►│  CODE   │───►│  PR     │
    │ an Issue│    │  Repo   │    │  & Test │    │Submitted│
    └─────────┘    └─────────┘    └─────────┘    └────┬────┘
                                                       │
                   ┌───────────────────────────────────┘
                   │
    ┌──────────────▼──┐    ┌─────────────┐    ┌──────────────┐
    │  Review Process  │───►│   Approved  │───►│  MERGED! 🎉  │
    │  (Maintainer)    │    │             │    │  Points Added│
    └──────────────────┘    └─────────────┘    └──────────────┘

📝 Step-by-Step Contribution Guide

# Step 1: Fork this repository on GitHub

# Step 2: Clone your fork
git clone https://github.com/YOUR_USERNAME/execra.git
cd execra

# Step 3: Create a new branch (NEVER work on main directly)
git checkout -b feature/your-feature-name
# OR for bug fixes:
git checkout -b fix/issue-number-short-description

# Step 4: Make your changes and test them
python -m pytest tests/

# Step 5: Commit with a clear message
git add .
git commit -m "feat: add OCR support for multi-language text detection (#42)"

# Step 6: Push to your fork
git push origin feature/your-feature-name

# Step 7: Open a Pull Request on GitHub
# Use the PR template provided in the repository

✅ Commit Message Convention

We follow Conventional Commits:

Prefix Use When
feat: Adding a new feature
fix: Fixing a bug
docs: Documentation only changes
style: Code formatting (no logic change)
refactor: Code restructuring (no feature/bug)
test: Adding or updating tests
chore: Build process, tooling changes

Examples:

feat: implement real-time screen delta detection
fix: resolve memory leak in camera feed handler (#88)
docs: add API reference for context engine
test: add unit tests for trust scorer module

🔍 Finding Good First Issues

Look for these labels on the Issues page:

Label Difficulty Good For
good first issue ⭐ Beginner First-time contributors
easy ⭐⭐ Easy Those with some experience
medium ⭐⭐⭐ Medium Intermediate contributors
hard ⭐⭐⭐⭐ Hard Advanced contributors
documentation Any Writers, tech writers
help wanted Varies Any contributor

🏷️ Issue Labels & Points

Points are awarded by GSSoC 2026 based on issue difficulty and contribution quality.

Label Points Typical Tasks
good first issue 10 pts Fixing typos, adding docstrings, small UI tweaks, writing examples
easy 25 pts Adding unit tests, small bug fixes, minor feature additions
medium 45 pts Feature modules, integration tasks, significant bug fixes
hard 60 pts Core architecture, new domain engines, performance optimization

🚫 What NOT to Do

❌  Do NOT submit empty or low-quality PRs just to collect points
❌  Do NOT spam issues asking to be assigned without reviewing
❌  Do NOT copy code from others without attribution
❌  Do NOT make changes outside the scope of the assigned issue
❌  Do NOT force-push to main or shared branches
✅  DO read the full issue before asking questions
✅  DO test your changes before submitting
✅  DO follow the code style guide (see CONTRIBUTING.md)
✅  DO be respectful and patient with maintainers

💬 Community & Support

Channel Link
💬 Discussion GitHub Discussions
🐛 Bug Reports Open an Issue
💡 Feature Requests Request Feature
📧 Maintainer ss9830872697@gmail.com

📜 Code of Conduct

This project follows the Contributor Covenant Code of Conduct v2.1.

In summary:

  • 🤝 Be welcoming and inclusive
  • 🗣️ Be respectful in all communications
  • 🚫 No harassment, discrimination, or harmful behavior
  • 🌱 Help beginners; everyone starts somewhere

Violations can be reported to ss9830872697@gmail.com.


📄 License

MIT License

Copyright (c) 2026 Execra Contributors

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

See LICENSE for the full text.


🙌 Acknowledgements

Powered By

GirlScript Summer of Code   OpenAI   Google Gemini   YOLOv8

Special thanks to:

  • 🌸 GirlScript Foundation — for organizing GSSoC and empowering open source contributors worldwide
  • All first-time contributors who made this project possible
  • The open source community for the foundational tools this project builds upon

📬 Contact

Maintainer GitHub Email
Sayantan Sahoo @sahoo-tech ss9830872697@gmail.com

Found this project interesting? Give it a ⭐ — it helps us grow!


Footer

Built with ❤️ for GirlScript Summer of Code 2026

Execra — Execute without boundaries.

About

No description, website, or topics provided.

Resources

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 100.0%