WebGenie

The Open-Source AI Web Automation Extension — Run sophisticated multi-agent systems directly in your browser. Automate complex web tasks, do actions, and streamline workflows.

Screencast.From.2026-05-05.09-39-09.mp4

Vision

WebGenie empowers developers and automation enthusiasts with a free, open-source alternative to AI web automation tools like OpenAI Operator. By leveraging local multi-agent AI systems, WebGenie enables intelligent web automation without vendor lock-in or cloud dependencies. Perfect for building custom workflows, testing automation logic, and experimenting with autonomous agents in a sandboxed browser environment.

Key Features

Multi-Agent Intelligence

Navigator Agent — Intelligent DOM interaction and web navigation that understands page structure
Planner Agent — High-level task planning and strategic reasoning to break down complex workflows
Validator Agent — Autonomous verification of task completion and result accuracy
Coordinated execution through Chrome Messaging APIs for seamless inter-agent communication

LLM Provider Flexibility

OpenAI — GPT-4, GPT-4 Turbo, GPT-3.5 Turbo for cutting-edge reasoning
Anthropic — Claude 3 (Opus, Sonnet, Haiku) for diverse capability tiers
Google Gemini — Gemini Pro and Gemini 1.5 for multimodal understanding
AWS Bedrock — Managed Claude/Llama/Titan family models on AWS
Llama API — Hosted Llama models via api.llama.com
Ollama — Local LLM support for self-hosted and privacy-conscious deployments
Azure OpenAI — Enterprise LLM deployments for organizational scale

Security & Privacy

Local Processing — All AI reasoning happens entirely in-browser, never leaves your machine
No API Fallback — No automatic cloud uploads or hidden data transmission
Content Sanitization — Built-in XSS and injection prevention for safe DOM manipulation
URL Validation — Protected navigation and safe browsing to prevent malicious redirects

Developer Experience

Hot Reload Development — Vite-powered rapid iteration for faster development cycles
Comprehensive Logging — Debug agent reasoning step-by-step to understand decision-making
Modular Architecture — Clean, extensible codebase designed for easy customization
Type-Safe — Strict TypeScript throughout for reliability and maintainability

User-Friendly Interface

Chat-Based Controls — Talk to the extension naturally, no complex syntax needed
Real-Time Feedback — Watch live agent execution status and progress updates
Visual DOM Analysis — Interactively inspect and explore page elements
Favorites & History — Easily reuse, refine, and iterate on previous automation tasks

Architecture Overview

WebGenie is built on a modular, layered architecture that separates concerns and enables clear communication between components. Here's how everything works together:

graph TB
    subgraph Browser["Browser Environment"]
        BS["Side Panel UI<br/>React + TypeScript"]
        OS["Options Page<br/>Settings & Configuration"]
        CS["Content Script<br/>Page Injection & Monitoring"]
    end

    subgraph Extension["Extension Core"]
        BG["Background Service Worker<br/>Manifest V3"]
        EX["Executor<br/>Task Orchestrator & Coordinator"]
    end

    subgraph Agents["Multi-Agent System"]
        NAV["Navigator Agent<br/>DOM Interaction & Navigation"]
        PLN["Planner Agent<br/>Strategy & Task Planning"]
        VAL["Validator Agent<br/>Task Verification & Completion"]
    end

    subgraph BrowserLayer["Browser Abstraction"]
        DOM["DOM Service<br/>Accessibility Trees & Analysis"]
        PAGE["Page Controller<br/>User Actions & Navigation"]
        CTX["Context Manager<br/>State & History Tracking"]
    end

    subgraph Services["Services Layer"]
        SEC["Security Module<br/>Sanitization & Threat Detection"]
        VOICE["Voice Processing<br/>Speech-to-Text Conversion"]
        ANALYTICS["Analytics Engine<br/>Performance Metrics & Tracking"]
    end

    subgraph LLM["Large Language Models"]
        OPENAI["OpenAI<br/>GPT-4 Family"]
        CLAUDE["Anthropic Claude<br/>Claude 3 Series"]
        GEMINI["Google Gemini<br/>Multimodal Intelligence"]
        BEDROCK["AWS Bedrock<br/>Claude/Llama/Titan Models"]
        LLAMA["Llama API<br/>Hosted Llama Models"]
        OLLAMA["Ollama Local<br/>Self-Hosted Models"]
        AZURE["Azure OpenAI<br/>Enterprise Deployments"]
        OPENROUTER["OpenRouter<br/>Unified Model Gateway"]
    end

    subgraph Storage["Data Persistence"]
        CHROME["Chrome Storage API<br/>Config & User State"]
    end

    BS -->|Message Passing| BG
    OS -->|Configuration| CHROME
    CS -->|DOM Observation| BG
    
    BG --> EX
    EX --> NAV
    EX --> PLN
    EX --> VAL
    
    NAV --> DOM
    NAV --> PAGE
    PLN --> CTX
    VAL --> PAGE
    
    DOM --> SEC
    PAGE --> SEC
    
    EX -->|LLM Queries| LLM
    OPENAI -.-> LLM
    CLAUDE -.-> LLM
    GEMINI -.-> LLM
    BEDROCK -.-> LLM
    LLAMA -.-> LLM
    OLLAMA -.-> LLM
    AZURE -.-> LLM
    OPENROUTER -.-> LLM
    
    SEC --> VOICE
    EX --> ANALYTICS
    
    CHROME -.-> Extension
    
    style BG fill:#667eea,stroke:#333,stroke-width:2px,color:#fff
    style EX fill:#764ba2,stroke:#333,stroke-width:2px,color:#fff
    style NAV fill:#f093fb,stroke:#333,stroke-width:2px,color:#fff
    style PLN fill:#f093fb,stroke:#333,stroke-width:2px,color:#fff
    style VAL fill:#f093fb,stroke:#333,stroke-width:2px,color:#fff
    style DOM fill:#4facfe,stroke:#333,stroke-width:2px,color:#fff
    style PAGE fill:#4facfe,stroke:#333,stroke-width:2px,color:#fff
    style SEC fill:#fa709a,stroke:#333,stroke-width:2px,color:#fff

If you want a very detailed walkthrough of the DOM engine, read docs/dom-deep-dive.md.

How It Works Together

User Interface Layer: The side panel (React + TypeScript) allows users to interact with the system through a chat interface. The options page lets users configure LLM providers and preferences, which are stored locally via Chrome Storage API.

Request Flow: When a user submits a task:

The Side Panel sends a message to the Background Service Worker
The Executor coordinates the request across the multi-agent system
The Planner breaks down the task into actionable steps
The Navigator executes steps by interacting with the DOM
The Validator checks if the task was completed successfully
Results are sent back to the UI for display

Agent Coordination: Agents communicate through the Executor using a message-passing pattern. The Navigator never directly executes actions—it requests the Page Controller to perform user interactions safely.

Browser Abstraction: The browser layer isolates Chrome API interactions, making it easier to test agents and maintain code. DOM operations go through the Security module, which sanitizes content and prevents malicious injections.

LLM Integration: All agents use the configured LLM provider for reasoning. The system supports OpenAI, Anthropic, Gemini, AWS Bedrock, Llama API, Ollama, Azure OpenAI, OpenRouter, and compatible custom OpenAI endpoints.

Provider Setup (Bedrock, Llama, Ollama)

AWS Bedrock

Open Options → Model Settings → add AWS Bedrock
Fill:
- Access Key = AWS access key ID
- AWS Secret Key = AWS secret access key
- AWS Region (for example us-east-1)
Keep model IDs in the Bedrock format (for example us.anthropic.claude-sonnet-4-20250514-v1:0)
Save and assign Bedrock models to Planner/Navigator

Bedrock credentials are passed as SigV4 AWS credentials and used directly by the Bedrock runtime client.

Llama API (Hosted)

Add Llama provider
Set Base Endpoint to https://api.llama.com/v1 (or your compatible endpoint)
Add your Llama API key
Save and select a Llama model for agents

Ollama (Local Server)

Start Ollama locally (default endpoint: http://localhost:11434)
Add Ollama provider
Set Base Endpoint to your Ollama server URL
Add installed model names exactly as seen in Ollama (examples: qwen3:14b, mistral-small:24b)
Save and select models for Planner/Navigator

If Ollama runs on another machine, expose it on your network and use that full http(s)://host:port URL.

Capabilities at a Glance

Feature	Description
Task Automation	Execute complex multi-step workflows with natural language commands
Data Extraction	Scrape and structure web data intelligently using AI reasoning
Form Filling	Automated form submission with intelligent field understanding
Navigation	Smart web browsing with context awareness and state tracking
Natural Language	Interact with agents using plain English—no special syntax required
Voice Input	Optional speech-to-text for hands-free automation control
Local Processing	Everything runs in-browser, no backend infrastructure needed
Privacy-First	All processing stays local; no data transmitted to servers

Quick Start (Build from Source)

Prerequisites

You'll need the following tools installed:

Node.js — Check your .nvmrc file for the specific required version
pnpm — Fast, disk-efficient package manager (required for this project)

Installation & Setup

Get up and running in just a few commands:

# Clone the repository
git clone https://github.com/derpx06/webgenie.git
cd webgenie

# Install dependencies
pnpm install

# Start development with hot reload
pnpm -F chrome-extension dev

# In another terminal, build for production
pnpm build

Loading in Chrome

Ready to test your extension?

Open chrome://extensions/ in your browser
Enable Developer mode using the toggle in the top-right corner
Click Load unpacked
Select the dist/ directory

Your extension is now loaded and ready to use!

Usage Examples

Simple Task Automation

"Go to example.com and search for 'AI automation tools', then tell me the top 3 results"

Form Filling

"Navigate to the contact form at example.com and fill it with:
- Name: John Doe
- Email: john@example.com
- Message: Hello, I'm interested in your services"

Data Extraction

"Visit the pricing page at example.com and extract the pricing table into a structured format"

Project Structure

WebGenie is organized as a monorepo with clear separation of concerns:

WebGenie/
├── chrome-extension/              # Core extension & background worker
│   ├── src/background/
│   │   ├── agent/                 # Multi-agent system (Navigator, Planner, Validator)
│   │   ├── browser/               # Chrome API abstraction layer
│   │   ├── services/              # Security, analytics, voice processing
│   │   └── task/                  # Task management & execution
│   └── public/                    # Static assets & permissions
│
├── pages/                          # User interface components
│   ├── side-panel/                # Main chat interface for user interaction
│   ├── options/                   # Settings & configuration page
│   └── content/                   # Content script for page injection
│
├── packages/                       # Shared utilities & libraries
│   ├── shared/                    # Common types and utility functions
│   ├── storage/                   # Chrome storage abstraction layer
│   ├── ui/                        # Reusable React components
│   ├── i18n/                      # Internationalization support
│   └── schema-utils/              # Zod validation schemas
│
└── docs/                          # Documentation & guides
    ├── MODULARITY_GUIDE.md        # Architecture & module organization
    ├── BEST_PRACTICES.md          # Development standards & guidelines
    └── README.md files            # Per-module documentation

For a deep dive into architecture and module organization, see MODULARITY_GUIDE.md.

Development

Available Commands

Here are the most common commands you'll use during development:

# Type checking - Catch TypeScript errors early
pnpm type-check

# Linting & formatting - Maintain code quality
pnpm lint
pnpm prettier

# Testing - Run automated tests
pnpm -F chrome-extension test              # Run all tests
pnpm -F chrome-extension test -- -t "Sanitizer"  # Run specific test

# Building
pnpm build                                 # Production build
pnpm zip                                   # Create distribution zip
pnpm dev                                   # Development with hot reload

# Cleaning
pnpm clean                                 # Clean all build artifacts
pnpm clean:bundle                          # Clean build outputs only
pnpm update-version                        # Update version across packages

Code Quality Standards

We maintain high code quality standards across the project:

TypeScript — Strict mode enabled throughout for type safety
ESLint + Prettier — Automated code formatting and linting
Vitest — Fast unit testing framework
Comprehensive Type Definitions — Full TypeScript coverage

For detailed development guidelines and best practices, see BEST_PRACTICES.md.

Security & Privacy

WebGenie puts privacy and security at the core of its design:

All AI reasoning happens entirely in-browser with zero cloud dependencies
No automatic telemetry or data collection (unless explicitly enabled by users)
Built-in content sanitization prevents XSS, injection attacks, and malicious content
URL validation prevents navigation to suspicious or blocked sites
User configuration stored locally in Chrome storage, never transmitted
Minimal required permissions for core functionality only
Full transparency in what data is processed and where

For detailed security information and threat modeling, see SECURITY.md.

Documentation

Comprehensive documentation is available:

MODULARITY_GUIDE.md — Complete architecture and module organization guide
BEST_PRACTICES.md — Code quality standards and development best practices
CONTRIBUTING.md — Contribution guidelines for developers and maintainers
CHANGELOG.md — Version history and release notes

Contributing

We welcome contributions from the community! Here's how you can help:

Fork the repository on GitHub
Create a feature branch for your work (git checkout -b feature/amazing-feature)
Commit your changes with clear, descriptive messages (git commit -m 'Add amazing feature')
Push to your branch (git push origin feature/amazing-feature)
Open a Pull Request with a detailed description of your changes

Please read CONTRIBUTING.md for our code standards, testing requirements, and development workflow.

License

This project is licensed under the Apache License 2.0 — see the LICENSE file for full details.

Disclaimer

This repository does not endorse or support blockchain, cryptocurrency, NFT projects, or similar derivative works. Any such projects are unaffiliated with the maintainers of this codebase.

Acknowledgments

WebGenie stands on the shoulders of exceptional open-source projects:

Chrome Extension Manifest V3 — Modern extension development
React 18 — Powerful UI framework
TypeScript — Type-safe JavaScript
Vite — Lightning-fast build tool
LangChain.js — LLM integration framework
Turbo — Monorepo build orchestration
Tailwind CSS — Utility-first styling

Support & Community

Have questions or want to help?

Report Bugs or Request Features — Open an issue on GitHub Issues
Join Community Discussions — Participate in our GitHub discussions
Read the Documentation — Check out the comprehensive guides

Made with care by the WebGenie community

Name		Name	Last commit message	Last commit date
Latest commit History 104 Commits
.github/workflows		.github/workflows
chrome-extension		chrome-extension
docs		docs
packages		packages
pages		pages
.codex		.codex
.env.example		.env.example
.eslintignore		.eslintignore
.eslintrc		.eslintrc
.example.env		.example.env
.gitignore		.gitignore
.npmrc		.npmrc
.nvmrc		.nvmrc
.prettierignore		.prettierignore
.prettierrc		.prettierrc
CONTRIBUTING.md		CONTRIBUTING.md
Jenkinsfile		Jenkinsfile
LICENSE		LICENSE
PROJECT_STRUCTURE.md		PROJECT_STRUCTURE.md
README.md		README.md
SECURITY.md		SECURITY.md
old_buildDomTree.js		old_buildDomTree.js
package-lock.json		package-lock.json
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
pnpm-workspace.yaml		pnpm-workspace.yaml
turbo.json		turbo.json
update_version.sh		update_version.sh
vite-env.d.ts		vite-env.d.ts

Folders and files

Latest commit

History

Repository files navigation

WebGenie

Vision

Key Features

Multi-Agent Intelligence

LLM Provider Flexibility

Security & Privacy

Developer Experience

User-Friendly Interface

Architecture Overview

How It Works Together

Provider Setup (Bedrock, Llama, Ollama)

AWS Bedrock

Llama API (Hosted)

Ollama (Local Server)

Capabilities at a Glance

Quick Start (Build from Source)

Prerequisites

Installation & Setup

Loading in Chrome

Usage Examples

Simple Task Automation

Form Filling

Data Extraction

Project Structure

Development

Available Commands

Code Quality Standards

Security & Privacy

Documentation

Contributing

License

Disclaimer

Acknowledgments

Support & Community

About

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages