Price Scout is a high-performance, distributed price tracking solution comprising a containerized Java REST API and a manifest V3 Chrome extension. The system is engineered to provide real-time product intelligence by bypassing traditional caching layers and interfacing directly with e-commerce platforms through headless browser automation.
The platform follows a decoupled client-server architecture. The Chrome Extension acts as the presentation and injection layer, while the Java Engine serves as the scraping and data persistence core.
graph TD
subgraph Client_Environment [Chrome Extension]
A[Content Script] -- Injects --> B[Quick Compare FAB]
B -- Trigger --> C[Popup UI]
C -- Fetch Request --> D[REST API Client]
end
subgraph Cloud_Backend [Java Engine - Hugging Face Space]
D -- HTTPS/JWT --> E[Spark Java API Server]
E -- Concurrent Task --> F[Engine Manager]
F -- Parallel Execution --> G[Selenium Scraper Pool]
G -- Direct Hit --> H[Amazon/Flipkart/Croma]
F -- Persistence --> I[SQLite History DAO]
end
- Runtime: OpenJDK 17
- Web Framework: Spark Java (Lighweight REST API)
- Browser Automation: Selenium 4.16.1 with WebDriverManager
- Stealth Integration: Custom User-Agent rotation and AutomationControlled bypass
- Database: SQLite 3.45 for persistent temporal price tracking
- Platform: Chrome Extension Manifest V3
- In-page Logic: Vanilla JavaScript with DOM Mutation Observers
- Communication: Fetch-based REST interaction with JWT authentication
- Styling: Scoped CSS to prevent conflicts with host platform styles
The following sequence diagram provides a granular look at the request lifecycle, highlighting the internal parallelization and data aggregation logic.
sequenceDiagram
autonumber
participant User as User (Product Page)
participant FAB as Floating Action Button
participant API as Spark REST API
participant Mgr as Engine Manager (ThreadPool)
participant Scrapers as Selenium Worker Pool
participant DB as SQLite (PriceHistoryDAO)
User->>FAB: Clicks "Quick Compare"
FAB->>API: GET /api/search?q={sanitized_title}
Note right of API: Validates JWT & Sanitizes Input
API->>Mgr: initializeTask(keyword)
par Parallel Scrape
Mgr->>Scrapers: AmazonScraper (Thread 1)
Mgr->>Scrapers: FlipkartScraper (Thread 2)
Mgr->>Scrapers: CromaScraper (Thread 3)
end
Scrapers-->>Mgr: Return Product Objects
Mgr->>Mgr: Filter nulls & Sort by Price (ASC)
critical Database Persistence
Mgr->>DB: savePriceHistory(best_deal)
end
Mgr-->>API: TreeMap<Platform, Product>
API-->>FAB: JSON Response (Aggregated Deals)
FAB->>User: Display Comparison Side-Panel
The system employs a multi-layered bypass strategy to ensure consistent data retrieval from highly protected endpoints:
- Automation Bypass: Dynamic
cdc_variable stripping in the ChromeDriver to preventnavigator.webdriverdetection. - Behavioral Mimicry: Randomized
scrollByandmouseMoveevents to simulate human interaction during page load. - Identity Rotation: Each worker thread utilizes a distinct User-Agent string from a randomized pool, mitigating fingerprinting patterns.
Price Scout utilizes a temporal schema to track market fluctuations over time.
- Table:
price_historyproduct_name: Primary lookup key (Indexed).platform: Origin store identifier.price: Normalized floating-point value.url: Direct canonical link.scraped_at: High-precision timestamp (Automatic).
Every search string is processed through a strict whitelist filter:
String sanitizedQuery = query.replaceAll("[^a-zA-Z0-9\\s]", "").trim();This ensures that the product title used for DOM extraction or Selenium navigation cannot be used as an injection vector for the headless browser or the backend OS.
- Issuance: Tokens are cryptographically signed with HMAC-256.
- Authorization: All endpoints (
/api/search,/api/history) are gated behind abefore()filter. - Integrity: The backend enforces strict expiration (
exp) and issuer (iss) checks on every inbound request.
All incoming search queries undergo regex-based sanitization ([^a-zA-Z0-9\s]) to mitigate XSS and command injection risks. The system enforces domain-level validation to prevent SSRF (Server-Side Request Forgery) by strictly allowing connections to a predefined whitelist of e-commerce domains.
The engine utilizes a shared ExecutorService with a fixed thread pool to manage concurrent scraping tasks, preventing thread exhaustion and ensuring stable response times under load.
The backend is designed to run in a containerized environment.
# Build the container
docker build -t price-scout-engine .
# Run locally
docker run -p 7860:7860 price-scout-engine- Navigate to
chrome://extensions. - Enable "Developer Mode".
- Select "Load Unpacked" and point to the
extension/directory.
The development follows a phased approach to increase intelligence and resilience:
- Phase 1 (Complete): Transition to Cloud-native architecture and Selenium Stealth integration.
- Phase 2 (Current): Implementation of Temporal Price History and SQL Persistence.
- Phase 3 (Planned): Integration of Linear Regression models for price drop prediction.
- Phase 4 (Planned): Automated Proxy Rotation and Redis Caching Layer.
Price Scout is engineered and maintained by The Avengers team.
- Purvansh Joshi: Lead Systems Architect & Engine Developer
- Parth Nailwal: Core Contributor & Systems Resilience
- Vansh Singh: Core Contributor & Automation Engineer
- The Avengers: Development Group
Distributed under the MIT License. See LICENSE for more information.