Skip to content

adilsondias-engineer/visual_process_automation

Repository files navigation

-> Documentation was generated by AI as per prompt: This project is a custom RPA developed to show how RPA can be used even to play games. It uses Selenium libraries, OpenCV, Numpy. Some the limitatons are:

  1. the images used to match (find needle) must be same resolution which the game is played
  2. any changes to the UI will make the RPA to fail as it's looking for positions on the screen.
  3. I've used on the past screenscraper techniques for mainframe or applications which didn't have APIs for interaction, all they had was the UI. Generate a comprehensive readme file for this project demonstrating the benefits and issues/limitations

πŸ€– Visual Process Automation Framework

A Python-based Robotic Process Automation (RPA) framework demonstrating computer vision-driven automation for applications without API access

Python OpenCV Selenium License


πŸ“‹ Overview

This framework showcases Robotic Process Automation (RPA) techniques that have been fundamental to enterprise automation for decades. By combining computer vision with browser automation, it demonstrates how to automate applications that lack programmatic interfaces β€” a common challenge in legacy system integration.

Key Demonstration: Just as enterprises have automated mainframe "green screens" and legacy applications through visual automation, this framework applies the same proven techniques to browser-based applications, proving that RPA principles are universally applicable across any visual interface.

🎯 Why This Matters

In the real world, not every application has an API:

  • Legacy Systems: Mainframe applications from the 1980s-90s still run critical business processes
  • Third-Party Applications: Vendor software without automation interfaces
  • Dynamic Web Apps: Canvas-based interfaces where traditional DOM selectors fail
  • Rapid Prototyping: Faster than waiting for official API development

This framework demonstrates the foundational techniques that power commercial RPA platforms like UiPath, Automation Anywhere, and Blue Prism.


πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    AUTOMATION LAYER                      β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚   Browser    β”‚  β”‚   Computer   β”‚  β”‚     GUI      β”‚  β”‚
β”‚  β”‚  Controller  β”‚  β”‚    Vision    β”‚  β”‚  Automation  β”‚  β”‚
β”‚  β”‚  (Selenium)  β”‚  β”‚   (OpenCV)   β”‚  β”‚ (PyAutoGUI)  β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                          ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                   PROCESSING LAYER                       β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚         Template Matching & Image Analysis       β”‚   β”‚
β”‚  β”‚              (OpenCV + NumPy)                    β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                          ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    DECISION LAYER                        β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚    Workflow Engine & State Management            β”‚   β”‚
β”‚  β”‚    (Pattern Matching, Decision Trees)            β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                          ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                  MONITORING LAYER                        β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚     Logging, Error Handling & Debug Output       β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ› οΈ Technologies & Stack

Component Technology Purpose
Core Language Python 3.7+ Framework foundation
Computer Vision OpenCV (cv2) Template matching & image processing
Browser Automation Selenium WebDriver Web navigation & interaction
GUI Automation PyAutoGUI Cross-platform mouse/keyboard control
Image Processing NumPy Array operations & numerical processing
Image Handling Pillow (PIL) Image format conversion
Logging Python logging Activity tracking & debugging

πŸ’‘ Use Cases & Applications

Enterprise Applications

  • Legacy System Integration: Automate mainframe terminals, AS/400, green screens
  • Third-Party Software: Interact with vendor applications lacking APIs
  • Desktop Application Testing: QA automation for GUI applications
  • Report Generation: Extract data from visual interfaces

Modern Applications

  • Canvas-Based Dashboards: Automate Power BI, Tableau, or custom data visualizations
  • Dynamic Web Apps: Handle applications where DOM selectors are unreliable
  • Visual Validation: Verify UI rendering in automated tests
  • Cross-Platform Automation: Work across web, desktop, and hybrid applications

Educational Value

  • Understanding computer vision fundamentals
  • Learning browser automation patterns
  • Implementing state machines and decision logic
  • Practicing event-driven programming

πŸš€ Quick Start

Prerequisites

  • Python 3.7 or higher
  • Google Chrome browser
  • ChromeDriver (matching your Chrome version)
  • 2GB RAM minimum (4GB recommended)

Installation

  1. Clone the repository
git clone <repository-url>
cd visual-automation-framework
  1. Install dependencies
pip install -r requirements.txt

requirements.txt:

opencv-python>=4.5.0
numpy>=1.19.0
selenium>=4.0.0
pillow>=8.0.0
pyautogui>=0.9.50
  1. Download ChromeDriver

  2. Configure Settings

Update config.yaml (or create it):

browser:
  user_data_dir: "C:\\Users\\<YOUR_USER>\\AppData\\Local\\Google\\Chrome\\User Data"
  window_width: 1920
  window_height: 1080

automation:
  screenshot_delay: 0.5
  click_duration: 0.3
  confidence_threshold: 0.7

logging:
  level: INFO
  file: automation.log

πŸ“– Usage

Basic Automation

from src.browser import Browser
from src.vision import Vision
from src.bot import AutomationBot

# Initialize components
browser = Browser(url="https://example.com", width=1920, height=1080)
vision = Vision(template_folder="templates/")
bot = AutomationBot(browser, vision)

# Run automation workflow
bot.execute_workflow()

Custom Template Matching

from src.vision import Vision
import cv2

# Initialize vision module
vision = Vision()

# Find single element
location = vision.find("button_submit.png", confidence=0.8)
if location:
    print(f"Found at: {location}")

# Find multiple elements
locations = vision.find_multiple("icon_notification.png", confidence=0.7)
print(f"Found {len(locations)} instances")

Error Handling & Retry Logic

import time

def execute_with_retry(action, max_attempts=3, delay=2):
    """Execute action with retry logic"""
    for attempt in range(max_attempts):
        try:
            result = action()
            if result:
                return result
        except Exception as e:
            logging.warning(f"Attempt {attempt + 1} failed: {e}")
            time.sleep(delay)
    return None

Keyboard Controls

  • H: Stop automation and exit
  • P: Pause automation
  • O: Resume automation

πŸ“Š How It Works

1. Screenshot Capture

# Selenium captures full browser content
screenshot = browser.driver.get_screenshot_as_png()
image = Image.open(io.BytesIO(screenshot))

2. Template Matching

# OpenCV searches for visual patterns
result = cv2.matchTemplate(screenshot, template, cv2.TM_CCOEFF_NORMED)
locations = np.where(result >= confidence_threshold)

3. Coordinate Translation

# Convert image coordinates to screen coordinates
screen_x = window_x + image_x
screen_y = window_y + image_y

4. Action Execution

# PyAutoGUI performs the interaction
pyautogui.moveTo(screen_x, screen_y, duration=0.3)
pyautogui.click()

βœ… Strengths & Benefits

1. No API Required ✨

Works with any application that has a visual interface, regardless of whether APIs exist.

2. Proven Enterprise Technique 🏒

Based on 40+ years of screen scraping methodology used in mission-critical business automation.

3. Cross-Application Flexibility πŸ”„

Same approach works for:

  • Web applications (via Selenium)
  • Desktop applications (via PyAutoGUI)
  • Terminal/mainframe interfaces
  • Virtual desktop environments (Citrix, RDP)

4. Rapid Development ⚑

  • No reverse engineering required
  • No protocol analysis needed
  • Visual debugging with screenshot analysis
  • Quick proof-of-concept creation

5. Educational Foundation πŸ“š

Teaches fundamental concepts:

  • Computer vision basics (template matching, image processing)
  • Browser automation patterns
  • State machine design
  • Event-driven architecture
  • Error handling strategies

6. Real-World Relevance πŸ’Ό

The same techniques power:

  • UiPath: Commercial RPA platform ($7B+ valuation)
  • Automation Anywhere: Enterprise automation leader
  • Blue Prism: Intelligent automation platform
  • Legacy integrations: Still used extensively in Fortune 500 companies

⚠️ Limitations & Challenges

1. 🎯 Resolution Dependency β€” CRITICAL

Problem: Template images must match the exact display resolution.

Original Resolution: 1920Γ—1080
Template Created: 1920Γ—1080 βœ…
Display Changes To: 2560Γ—1440 ❌ BREAKS

Impact: Changing resolution, zoom level, or DPI scaling breaks all detection.

Mitigations:

  • Create template sets for multiple resolutions
  • Use scale-invariant features (SIFT, SURF, ORB)
  • Implement multi-scale template matching
  • Store templates at multiple sizes

2. πŸ”§ UI Changes = Maintenance Nightmare

Problem: Any visual update breaks automation.

Examples:

  • Button position changes
  • Color scheme updates
  • Font changes
  • Icon redesigns
  • Seasonal themes

Impact: Requires constant template recapture and testing.

Mitigations:

  • Use OCR for text-based elements (Tesseract)
  • Create multiple template variants
  • Implement fuzzy matching
  • Combine with DOM selectors where possible

3. ⏱️ Performance Limitations

Bottlenecks:

  • Screenshot capture: 50-200ms per frame
  • Template matching: 100-500ms per template per screenshot
  • Mouse movements: 2-5 seconds with safety delays
  • Total cycle time: 5-10 seconds per action

Impact: Much slower than API-based automation or human interaction.

Optimizations:

  • Cache screenshots when possible
  • Use region-of-interest (ROI) cropping
  • Implement parallel template matching
  • Optimize confidence thresholds

4. 🎲 Reliability Issues

False Positives:

# Similar buttons may match incorrectly
"Save" button matches "Save As" button (visual similarity)

False Negatives:

# Variations break matching
- Hover state (different color)
- Loading animations
- Transparency effects
- Shadows or gradients

Timing Problems:

  • Network latency causes UI delays
  • Hardcoded sleep() calls are brittle
  • Race conditions during page loads

Mitigations:

  • Implement smart waiting (wait for specific elements)
  • Use multiple confidence thresholds
  • Add context-aware matching (check surrounding elements)
  • Implement retry logic with exponential backoff

5. πŸ”’ Limited Adaptability

Static Logic:

  • Cannot adapt to unexpected UI states
  • Follows predefined decision trees only
  • No learning from failures

Poor Error Recovery:

# Example: Gets stuck if unexpected popup appears
if not find_button("ok"):
    # No fallback strategy defined
    # Automation hangs indefinitely

Solutions:

  • Implement state recovery mechanisms
  • Add timeout-based failsafes
  • Use ML models for adaptive recognition (future enhancement)

6. βš™οΈ Configuration Complexity

Setup Requirements:

❌ Chrome profile paths (OS-specific)
❌ Game/app credentials  
❌ Template image creation (manual, tedious)
❌ Coordinate calibration per screen
❌ Confidence threshold tuning per template
❌ ChromeDriver version matching

Impact: High barrier to entry; difficult for non-technical users.

7. βš–οΈ Ethical & Legal Considerations

When using for automation of online services:

  • ⚠️ May violate Terms of Service
  • ⚠️ Could be considered unauthorized access
  • ⚠️ Risk of account suspension/banning
  • ⚠️ Potential legal consequences

Use responsibly: Only automate applications you own or have explicit permission to automate.

8. πŸ–₯️ Platform Dependencies

  • Windows-centric: PyAutoGUI behavior varies across operating systems
  • Browser-specific: Current implementation only supports Chrome
  • Single-threaded: Cannot run multiple automation instances easily

9. πŸ” Security Concerns

# ⚠️ Current implementation issues:
username = "admin"  # Plain text in code
password = "P@ssw0rd"  # No encryption
browser_profile = "Default"  # Full access to user data

Risks:

  • Credentials exposed in source code
  • Requires access to user's browser profile
  • No secure credential storage
  • Potential data exposure

10. πŸ”„ Ongoing Maintenance Burden

Required Maintenance:

  • Weekly: Check for UI changes
  • Monthly: Update templates
  • Quarterly: Adjust logic for new features
  • Annually: Major refactoring for big updates

Cost: Can exceed initial development time significantly.


πŸ›οΈ Historical Context: The Evolution of Screen Automation

The Mainframe Era (1980s-2000s)

Before modern APIs, enterprises faced a critical challenge: How do you automate systems that only have visual interfaces?

The Problem

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  CUSTOMER RECORDS SYSTEM (CRS)  β”‚  ← Critical business application
β”‚  IBM Mainframe - Green Screen   β”‚  ← No API, no database access
β”‚  3270 Terminal Protocol         β”‚  ← Only keyboard/display interface
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Characteristics of legacy systems:

  • Fixed-width text screens (80Γ—24 characters)
  • No mouse support (keyboard only)
  • Position-based data (row 5, column 10 = customer name)
  • No automation interface (human operators required)

The Solution: Screen Scraping

Companies developed tools to:

  1. Capture terminal screens β†’ Read text buffer
  2. Parse fixed positions β†’ Extract data from known coordinates
  3. Identify fields β†’ Recognize labels and values
  4. Automate keyboard entry β†’ Send commands programmatically
  5. Extract for reporting β†’ Export data to modern systems

Example Workflow:

Human Process:
1. Press F3 to access customer screen
2. Type customer ID at row 7, col 15
3. Press ENTER
4. Read name from row 9, col 20
5. Copy to Excel spreadsheet
6. Repeat for 1,000 customers Γ— 8 hours = 3 days

Automated Process:
1. Script sends F3 key
2. Script types customer ID
3. Script sends ENTER
4. Script reads screen buffer position
5. Script writes to database
6. Complete 1,000 customers in 2 hours

Real-World Example: Banking Industry

Scenario: A bank needed to integrate a 1985 mainframe loan system with a new 2015 web portal.

Options:

  1. ❌ Replace mainframe (cost: $50M, time: 3 years, risk: HIGH)
  2. ❌ Develop API for legacy system (cost: $5M, time: 18 months)
  3. βœ… Screen scraping integration (cost: $200K, time: 3 months)

Implementation:

# Pseudo-code for mainframe scraping
def get_customer_loan_balance(customer_id):
    # Connect to 3270 emulator
    terminal = connect_to_mainframe()
    
    # Navigate using keyboard commands
    terminal.send_key("F3")  # Access loans menu
    terminal.wait_for_screen("LOAN SYSTEM MAIN")
    
    # Enter customer ID
    terminal.move_cursor(7, 15)
    terminal.type_text(customer_id)
    terminal.send_key("ENTER")
    
    # Read result from fixed position
    terminal.wait_for_screen("CUSTOMER DETAILS")
    loan_balance = terminal.read_position(12, 30, length=10)
    
    return float(loan_balance)

Why This Remains Relevant Today

1. Legacy Systems Are Everywhere

  • 43% of banking systems run on COBOL (Reuter's 2017 survey)
  • Average age of core enterprise systems: 12+ years
  • Government agencies run software from the 1970s-80s
  • Insurance companies still use green-screen mainframes

2. APIs Aren't Always Available

Scenario: Your company uses vendor software

Option A: Request API from vendor
β”œβ”€β”€ Response: "Not in roadmap"
β”œβ”€β”€ Timeline: 18-24 months (maybe)
└── Cost: $$$$ enterprise licensing

Option B: Screen scraping
β”œβ”€β”€ Response: Immediate
β”œβ”€β”€ Timeline: 2-4 weeks
└── Cost: Development time only

3. Visual Automation as a Bridge

Modern use cases:

  • Citrix/RDP Environments: Virtual desktops with no API access
  • Third-Party SaaS: Vendors who won't provide APIs
  • Legacy Desktop Apps: 20-year-old applications still in production
  • Visual Testing: Verifying that UI renders correctly

πŸ“Έ Template Image Management

Creating Effective Templates

  1. Capture at Target Resolution
# Game/app running at 1920Γ—1080
# Screenshot must also be 1920Γ—1080
  1. Crop Precisely
# Include only the target element
# Too large = false positives
# Too small = false negatives
  1. Save with Transparency
# PNG format with alpha channel preferred
# Helps with varying backgrounds
  1. Test Multiple Thresholds
for confidence in [0.5, 0.6, 0.7, 0.8, 0.9]:
    result = vision.find("button.png", confidence)
    print(f"Confidence {confidence}: {result}")

Template Organization

images/
β”œβ”€β”€ buttons/
β”‚   β”œβ”€β”€ ok.png
β”‚   β”œβ”€β”€ cancel.png
β”‚   └── submit.png
β”œβ”€β”€ icons/
β”‚   β”œβ”€β”€ notification.png
β”‚   └── settings.png
β”œβ”€β”€ indicators/
β”‚   β”œβ”€β”€ loading.png
β”‚   └── complete.png
└── fallbacks/
    β”œβ”€β”€ ok_hover.png
    └── ok_disabled.png

πŸ”§ Advanced Configuration

Tuning Detection Sensitivity

# Strict matching (fewer false positives)
result = vision.find("critical_button.png", confidence=0.9)

# Lenient matching (catches variations)
result = vision.find("icon.png", confidence=0.6)

# Context-aware matching
button = vision.find("ok.png", confidence=0.7)
if button and vision.find_nearby("dialog_title.png", button, radius=100):
    # Confirmed it's the right "OK" button
    click(button)

Dynamic Wait Strategies

def smart_wait(template, timeout=30, poll_interval=0.5):
    """Wait for element to appear with timeout"""
    start_time = time.time()
    
    while time.time() - start_time < timeout:
        location = vision.find(template)
        if location:
            return location
        time.sleep(poll_interval)
    
    raise TimeoutError(f"Element {template} not found after {timeout}s")

πŸ“ˆ Future Enhancements

Short-Term (Low-Hanging Fruit)

  • External Configuration: Move credentials to config.yaml
  • Multi-Resolution Support: Scale templates automatically
  • OCR Integration: Use Tesseract for text-based element detection
  • Better Logging: Structured logging with JSON output
  • Unit Tests: Test coverage for vision and browser modules

Medium-Term (Significant Improvements)

  • Scale-Invariant Matching: Implement SIFT/SURF/ORB
  • Machine Learning: Train models to recognize UI patterns
  • Error Recovery: Automatic retry with fallback strategies
  • Performance Optimization: Parallel template matching
  • Dashboard: Web-based monitoring and control interface

Long-Term (Major Features)

  • Cross-Platform Support: Native Linux/macOS support
  • Cloud Deployment: Run headless in containers
  • Visual Flow Builder: Drag-and-drop workflow design
  • Adaptive Learning: Improve from historical successes/failures
  • API Integration: Hybrid approach (API-first, vision fallback)

πŸ§ͺ Testing & Validation

Manual Testing Checklist

  • Templates load correctly
  • Screenshot capture works at target resolution
  • Template matching finds elements with >90% accuracy
  • Click coordinates are accurate (Β±5 pixels)
  • Error handling triggers on missing elements
  • Logging captures all significant events

Automated Testing

# tests/test_vision.py
import unittest
from src.vision import Vision

class TestVision(unittest.TestCase):
    def setUp(self):
        self.vision = Vision()
    
    def test_template_matching(self):
        """Test template matching accuracy"""
        screenshot = cv2.imread("test_data/screenshot.png")
        template = cv2.imread("test_data/button.png")
        
        result = self.vision.find(screenshot, template, confidence=0.8)
        self.assertIsNotNone(result)
        self.assertTrue(0 <= result[0] <= screenshot.shape[1])

πŸ“š Learning Resources

Computer Vision

Browser Automation

RPA Concepts


🀝 Contributing

Contributions are welcome! Areas for improvement:

High Priority

  • Multi-resolution template support
  • Better error handling and recovery
  • Performance optimizations
  • Documentation improvements

Feature Requests

  • Support for additional browsers (Firefox, Edge)
  • OCR integration for text-based detection
  • Machine learning-based element recognition
  • Configuration GUI for non-technical users

How to Contribute

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

βš–οΈ Legal Disclaimer

FOR EDUCATIONAL PURPOSES ONLY

This project demonstrates automation techniques used in enterprise software development. It is intended for:

  • βœ… Learning computer vision and automation concepts
  • βœ… Experimenting with your own applications
  • βœ… Understanding legacy system integration patterns
  • βœ… Educational research and skill development

NOT intended for:

  • ❌ Violating Terms of Service of any application
  • ❌ Gaining unfair advantages in competitive environments
  • ❌ Accessing systems without authorization
  • ❌ Any activity that could be considered unethical or illegal

Important Warnings

  1. Terms of Service: Automating online services may violate their ToS
  2. Account Risk: Could result in account suspension or permanent ban
  3. Legal Consequences: Unauthorized automation may have legal ramifications
  4. Ethical Considerations: Automation that harms others is not acceptable

The authors assume NO responsibility for consequences arising from the use of this software. Use at your own risk and only for lawful purposes.


πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

MIT License Summary

  • βœ… Commercial use allowed
  • βœ… Modification allowed
  • βœ… Distribution allowed
  • βœ… Private use allowed
  • ⚠️ No warranty provided
  • ⚠️ No liability accepted

πŸ™ Acknowledgments

  • OpenCV Community - For the powerful computer vision library
  • Selenium Project - For enabling browser automation
  • PyAutoGUI Contributors - For cross-platform GUI automation
  • Legacy System Pioneers - Who developed screen scraping techniques in the 1980s
  • RPA Industry Leaders - UiPath, Automation Anywhere, Blue Prism for validating these approaches

πŸ“Š Project Stats

Lines of Code Modules Templates Documentation


Remember: This project represents 40+ years of enterprise automation history, demonstrating techniques that remain relevant today for legacy system integration and automation scenarios where APIs are unavailable. Use responsibly and ethically.

Releases

No releases published

Packages

 
 
 

Contributors

Languages