Price Scout: Advanced Price Discovery & Analytics Platform

Price Scout is a high-performance, distributed price tracking solution comprising a containerized Java REST API and a manifest V3 Chrome extension. The system is engineered to provide real-time product intelligence by bypassing traditional caching layers and interfacing directly with e-commerce platforms through headless browser automation.

System Architecture

The platform follows a decoupled client-server architecture. The Chrome Extension acts as the presentation and injection layer, while the Java Engine serves as the scraping and data persistence core.

High-Level Component Diagram

graph TD
    subgraph Client_Environment [Chrome Extension]
        A[Content Script] -- Injects --> B[Quick Compare FAB]
        B -- Trigger --> C[Popup UI]
        C -- Fetch Request --> D[REST API Client]
    end

    subgraph Cloud_Backend [Java Engine - Hugging Face Space]
        D -- HTTPS/JWT --> E[Spark Java API Server]
        E -- Concurrent Task --> F[Engine Manager]
        F -- Parallel Execution --> G[Selenium Scraper Pool]
        G -- Direct Hit --> H[Amazon/Flipkart/Croma]
        F -- Persistence --> I[SQLite History DAO]
    end

Technical Specifications

Backend Core

Runtime: OpenJDK 17
Web Framework: Spark Java (Lighweight REST API)
Browser Automation: Selenium 4.16.1 with WebDriverManager
Stealth Integration: Custom User-Agent rotation and AutomationControlled bypass
Database: SQLite 3.45 for persistent temporal price tracking

Frontend Interface

Platform: Chrome Extension Manifest V3
In-page Logic: Vanilla JavaScript with DOM Mutation Observers
Communication: Fetch-based REST interaction with JWT authentication
Styling: Scoped CSS to prevent conflicts with host platform styles

Operational Workflow

The following sequence diagram provides a granular look at the request lifecycle, highlighting the internal parallelization and data aggregation logic.

sequenceDiagram
    autonumber
    participant User as User (Product Page)
    participant FAB as Floating Action Button
    participant API as Spark REST API
    participant Mgr as Engine Manager (ThreadPool)
    participant Scrapers as Selenium Worker Pool
    participant DB as SQLite (PriceHistoryDAO)

    User->>FAB: Clicks "Quick Compare"
    FAB->>API: GET /api/search?q={sanitized_title}
    Note right of API: Validates JWT & Sanitizes Input
    API->>Mgr: initializeTask(keyword)
    
    par Parallel Scrape
        Mgr->>Scrapers: AmazonScraper (Thread 1)
        Mgr->>Scrapers: FlipkartScraper (Thread 2)
        Mgr->>Scrapers: CromaScraper (Thread 3)
    end
    
    Scrapers-->>Mgr: Return Product Objects
    Mgr->>Mgr: Filter nulls & Sort by Price (ASC)
    
    critical Database Persistence
        Mgr->>DB: savePriceHistory(best_deal)
    end
    
    Mgr-->>API: TreeMap<Platform, Product>
    API-->>FAB: JSON Response (Aggregated Deals)
    FAB->>User: Display Comparison Side-Panel

Technical Micro-Details

Stealth Scraping Architecture

The system employs a multi-layered bypass strategy to ensure consistent data retrieval from highly protected endpoints:

Automation Bypass: Dynamic cdc_ variable stripping in the ChromeDriver to prevent navigator.webdriver detection.
Behavioral Mimicry: Randomized scrollBy and mouseMove events to simulate human interaction during page load.
Identity Rotation: Each worker thread utilizes a distinct User-Agent string from a randomized pool, mitigating fingerprinting patterns.

Data Persistence Schema

Price Scout utilizes a temporal schema to track market fluctuations over time.

Table: price_history
- product_name: Primary lookup key (Indexed).
- platform: Origin store identifier.
- price: Normalized floating-point value.
- url: Direct canonical link.
- scraped_at: High-precision timestamp (Automatic).

API Security Implementation

Input Sanitization Protocol

Every search string is processed through a strict whitelist filter:

String sanitizedQuery = query.replaceAll("[^a-zA-Z0-9\\s]", "").trim();

This ensures that the product title used for DOM extraction or Selenium navigation cannot be used as an injection vector for the headless browser or the backend OS.

JWT Lifecycle

Issuance: Tokens are cryptographically signed with HMAC-256.
Authorization: All endpoints (/api/search, /api/history) are gated behind a before() filter.
Integrity: The backend enforces strict expiration (exp) and issuer (iss) checks on every inbound request.

Security & Compliance

Data Integrity

All incoming search queries undergo regex-based sanitization ([^a-zA-Z0-9\s]) to mitigate XSS and command injection risks. The system enforces domain-level validation to prevent SSRF (Server-Side Request Forgery) by strictly allowing connections to a predefined whitelist of e-commerce domains.

Resource Management

The engine utilizes a shared ExecutorService with a fixed thread pool to manage concurrent scraping tasks, preventing thread exhaustion and ensuring stable response times under load.

Installation & Deployment

Backend Deployment (Docker)

The backend is designed to run in a containerized environment.

# Build the container
docker build -t price-scout-engine .

# Run locally
docker run -p 7860:7860 price-scout-engine

Extension Installation

Navigate to chrome://extensions.
Enable "Developer Mode".
Select "Load Unpacked" and point to the extension/ directory.

Project Roadmap

The development follows a phased approach to increase intelligence and resilience:

Phase 1 (Complete): Transition to Cloud-native architecture and Selenium Stealth integration.
Phase 2 (Current): Implementation of Temporal Price History and SQL Persistence.
Phase 3 (Planned): Integration of Linear Regression models for price drop prediction.
Phase 4 (Planned): Automated Proxy Rotation and Redis Caching Layer.

Meet The Team

Price Scout is engineered and maintained by The Avengers team.

Purvansh Joshi: Lead Systems Architect & Engine Developer
Parth Nailwal: Core Contributor & Systems Resilience
Vansh Singh: Core Contributor & Automation Engineer
The Avengers: Development Group

License

Distributed under the MIT License. See LICENSE for more information.

Name		Name	Last commit message	Last commit date
Latest commit History 107 Commits
.github		.github
backend		backend
docs		docs
extension		extension
host-config		host-config
proposal		proposal
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
README.md		README.md
SECURITY.md		SECURITY.md
checkstyle.xml		checkstyle.xml
fix_registry.reg		fix_registry.reg
install_build.ps1		install_build.ps1
mock_test_client.py		mock_test_client.py
scrapegraph_rfc.md		scrapegraph_rfc.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Price Scout: Advanced Price Discovery & Analytics Platform

System Architecture

High-Level Component Diagram

Technical Specifications

Backend Core

Frontend Interface

Operational Workflow

Technical Micro-Details

Stealth Scraping Architecture

Data Persistence Schema

API Security Implementation

Input Sanitization Protocol

JWT Lifecycle

Security & Compliance

Data Integrity

Resource Management

Installation & Deployment

Backend Deployment (Docker)

Extension Installation

Project Roadmap

Meet The Team

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Price Scout: Advanced Price Discovery & Analytics Platform

System Architecture

High-Level Component Diagram

Technical Specifications

Backend Core

Frontend Interface

Operational Workflow

Technical Micro-Details

Stealth Scraping Architecture

Data Persistence Schema

API Security Implementation

Input Sanitization Protocol

JWT Lifecycle

Security & Compliance

Data Integrity

Resource Management

Installation & Deployment

Backend Deployment (Docker)

Extension Installation

Project Roadmap

Meet The Team

License

About

Topics

Resources

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages