Labelingo Design Document

Overview

Labelingo is a tool for annotating UI screenshots with translations. It combines OCR (Optical Character Recognition) with AI-powered translation to create annotated SVG files that show both the original text and its translation.

Architecture

graph TD
A[User Input] -->|Image & Target Lang| B[CLI Interface]
B -->|Settings| C[Analysis Engine]
subgraph Analysis
C -->|Image| D[OpenAI Vision]
C -->|Image| E[OCR Backend]
D -->|Text & Translations| F[Result Merger]
E -->|Text Locations| F
end
subgraph Caching
D <-->|Cache Check| G[Response Cache]
E <-->|Cache Check| G
end
F -->|Combined Results| H[SVG Generator]
H -->|SVG Content| I[Output File]
I -->|Preview| J[Web Browser]
I -->|Open| K[System Viewer]
style Analysis fill:#f5f5f5,stroke:#333,stroke-width:2px
style Caching fill:#f0f8ff,stroke:#333,stroke-width:2px

The system consists of four main components:

CLI Interface (cli.py)
OCR and Analysis Engine (ocr.py)
SVG Annotation Generator (annotator.py)
Utilities and Caching (utils.py, response_cache.py)

Component Details

CLI Interface

Handles command-line arguments and configuration
Manages the workflow between components
Provides options for output format and preview

OCR and Analysis Engine

Supports multiple backends for text detection:

OpenAI Vision API
- Primary source for translations
- Provides source language detection
- Image preprocessing:
  - Scales images to max 2048px
  - Optimizes JPEG quality
- Uses structured output format
- Results are cached based on image content and analysis parameters
OCR Backends
- Claude Vision API (default)
- Tesseract OCR
- EasyOCR
- PaddleOCR
- Each provides text location and content
- Results are combined with OpenAI translations

SVG Annotation Generator

Creates interactive SVG output
Features:
- Original screenshot as background
- Bounding boxes around detected text
- Curved connector lines
- Numbered annotations with translations
- Responsive scaling

Caching System

Caches API responses to reduce costs and latency
Cache keys include:
- API endpoint
- Image content hash
- Schema/structure hash
- Target language

Data Flow

graph LR
    U[User] --> |Image & Target Lang| CLI
    CLI --> |Scale & Send| OpenAI
    OpenAI --> |Check| Cache
    Cache --> |Hit| OpenAI
    OpenAI --> |Miss| Analysis
    Analysis --> Cache
    CLI --> OCR
    OCR --> |Check| Cache
    Cache --> |Hit| OCR
    OCR --> |Miss| Detection
    Detection --> Cache
    OpenAI --> |Results| Merger
    OCR --> |Results| Merger
    Merger --> SVG[SVG Generator]
    SVG --> |Preview| Browser
    SVG --> |Save| File

User provides screenshot and target language
Image is preprocessed and scaled
OpenAI Vision analyzes for text and translations
Selected OCR backend processes for text locations
Results are merged:
- OCR provides text locations
- OpenAI provides translations
SVG generator creates annotated output
Result is displayed or saved

Design Decisions

Image Processing

Max dimension of 2048px balances API costs with accuracy
JPEG quality of 85% optimizes size vs. quality
Lanczos resampling for best text clarity

Caching Strategy

Comprehensive cache keys prevent redundant API calls
Schema versioning handles API changes
File-based cache for persistence

Translation Workflow

OpenAI Vision provides initial analysis
OCR backend provides precise text locations
Results are merged prioritizing:
- OCR for text locations
- OpenAI for translations
- Source language detection

Error Handling

Graceful fallbacks for missing dependencies
Clear error messages for API issues
Debug mode for troubleshooting

Label Placement Strategy

The SVG annotator uses intelligent label placement to improve readability and utilize space efficiently:

Margin Calculation
- Margins are calculated dynamically based on label content
- Left margin: width of longest left-side label + 60px padding
- Right margin: width of longest right-side label + 60px padding
- Image centered between margins
Label Distribution
- For elements with bounding boxes:
  - Elements in left half → labels on left
  - Elements in right half → labels on right
- For elements without bounding boxes:
  - Alternates between left and right sides
  - Placed at bottom of respective sides
  - Uses bullet points (•) instead of numbers
Vertical Positioning
- Labels are sorted by their corresponding element's y-position
- Minimum vertical spacing of 25px between labels
- Labels without bounding boxes placed at bottom
- Each side (left/right) tracks its last y-position to prevent overlap
Connector Lines
- White outline (4px) with colored center line (2px)
- Curved Bézier paths with rounded line caps
- For right-side labels:
  - Connects from 20px left of label to right edge of element box
- For left-side labels:
  - Connects from start of label to left edge of element box
Visual Elements
- Rounded rectangles (6px radius) around detected UI elements
- 4px padding around detected element bounds
- White background behind labels to ensure readability
- Text padding of 5px around labels
- Consistent font family (Helvetica Neue/Arial) with defined sizes:
  - Numbers: 16px
  - Text: 15px
  - Title: 20px

This approach provides:

Clean, consistent layout
Clear visual connection between labels and elements
Optimal space utilization
Improved readability through careful spacing and typography
Graceful handling of elements without detected bounds

Future Improvements

Parallel Processing
- Concurrent API calls for faster processing
- Batch processing for multiple images
UI Improvements
- Interactive web interface
- Real-time preview
- Custom annotation styles
Enhanced Analysis
- Context-aware translations
- UI element type detection
- Alternative translation services
Output Formats
- PDF export
- HTML with interactive features
- Translation memory export

Dependencies

Core:

click: CLI interface
Pillow: Image processing
anthropic: Claude API
openai: OpenAI Vision API
python-dotenv: Configuration

Optional (OCR):

pytesseract
easyocr
paddleocr
paddlepaddle

Configuration

The system uses environment variables for API keys:

ANTHROPIC_API_KEY: For Claude Vision API
OPENAI_API_KEY: For OpenAI Vision API

These can be set in the environment or in a .env file.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Labelingo Design Document

Overview

Architecture

Component Details

CLI Interface

OCR and Analysis Engine

SVG Annotation Generator

Caching System

Data Flow

Design Decisions

Image Processing

Caching Strategy

Translation Workflow

Error Handling

Label Placement Strategy

Future Improvements

Dependencies

Configuration

FilesExpand file tree

design.md

Latest commit

History

design.md

File metadata and controls

Labelingo Design Document

Overview

Architecture

Component Details

CLI Interface

OCR and Analysis Engine

SVG Annotation Generator

Caching System

Data Flow

Design Decisions

Image Processing

Caching Strategy

Translation Workflow

Error Handling

Label Placement Strategy

Future Improvements

Dependencies

Configuration