Labelingo is a tool for annotating UI screenshots with translations. It combines OCR (Optical Character Recognition) with AI-powered translation to create annotated SVG files that show both the original text and its translation.
graph TD
A[User Input] -->|Image & Target Lang| B[CLI Interface]
B -->|Settings| C[Analysis Engine]
subgraph Analysis
C -->|Image| D[OpenAI Vision]
C -->|Image| E[OCR Backend]
D -->|Text & Translations| F[Result Merger]
E -->|Text Locations| F
end
subgraph Caching
D <-->|Cache Check| G[Response Cache]
E <-->|Cache Check| G
end
F -->|Combined Results| H[SVG Generator]
H -->|SVG Content| I[Output File]
I -->|Preview| J[Web Browser]
I -->|Open| K[System Viewer]
style Analysis fill:#f5f5f5,stroke:#333,stroke-width:2px
style Caching fill:#f0f8ff,stroke:#333,stroke-width:2px
The system consists of four main components:
- CLI Interface (
cli.py) - OCR and Analysis Engine (
ocr.py) - SVG Annotation Generator (
annotator.py) - Utilities and Caching (
utils.py,response_cache.py)
- Handles command-line arguments and configuration
- Manages the workflow between components
- Provides options for output format and preview
Supports multiple backends for text detection:
-
OpenAI Vision API
- Primary source for translations
- Provides source language detection
- Image preprocessing:
- Scales images to max 2048px
- Optimizes JPEG quality
- Uses structured output format
- Results are cached based on image content and analysis parameters
-
OCR Backends
- Claude Vision API (default)
- Tesseract OCR
- EasyOCR
- PaddleOCR
- Each provides text location and content
- Results are combined with OpenAI translations
- Creates interactive SVG output
- Features:
- Original screenshot as background
- Bounding boxes around detected text
- Curved connector lines
- Numbered annotations with translations
- Responsive scaling
- Caches API responses to reduce costs and latency
- Cache keys include:
- API endpoint
- Image content hash
- Schema/structure hash
- Target language
graph LR
U[User] --> |Image & Target Lang| CLI
CLI --> |Scale & Send| OpenAI
OpenAI --> |Check| Cache
Cache --> |Hit| OpenAI
OpenAI --> |Miss| Analysis
Analysis --> Cache
CLI --> OCR
OCR --> |Check| Cache
Cache --> |Hit| OCR
OCR --> |Miss| Detection
Detection --> Cache
OpenAI --> |Results| Merger
OCR --> |Results| Merger
Merger --> SVG[SVG Generator]
SVG --> |Preview| Browser
SVG --> |Save| File
- User provides screenshot and target language
- Image is preprocessed and scaled
- OpenAI Vision analyzes for text and translations
- Selected OCR backend processes for text locations
- Results are merged:
- OCR provides text locations
- OpenAI provides translations
- SVG generator creates annotated output
- Result is displayed or saved
- Max dimension of 2048px balances API costs with accuracy
- JPEG quality of 85% optimizes size vs. quality
- Lanczos resampling for best text clarity
- Comprehensive cache keys prevent redundant API calls
- Schema versioning handles API changes
- File-based cache for persistence
- OpenAI Vision provides initial analysis
- OCR backend provides precise text locations
- Results are merged prioritizing:
- OCR for text locations
- OpenAI for translations
- Source language detection
- Graceful fallbacks for missing dependencies
- Clear error messages for API issues
- Debug mode for troubleshooting
The SVG annotator uses intelligent label placement to improve readability and utilize space efficiently:
-
Margin Calculation
- Margins are calculated dynamically based on label content
- Left margin: width of longest left-side label + 60px padding
- Right margin: width of longest right-side label + 60px padding
- Image centered between margins
-
Label Distribution
- For elements with bounding boxes:
- Elements in left half → labels on left
- Elements in right half → labels on right
- For elements without bounding boxes:
- Alternates between left and right sides
- Placed at bottom of respective sides
- Uses bullet points (•) instead of numbers
- For elements with bounding boxes:
-
Vertical Positioning
- Labels are sorted by their corresponding element's y-position
- Minimum vertical spacing of 25px between labels
- Labels without bounding boxes placed at bottom
- Each side (left/right) tracks its last y-position to prevent overlap
-
Connector Lines
- White outline (4px) with colored center line (2px)
- Curved Bézier paths with rounded line caps
- For right-side labels:
- Connects from 20px left of label to right edge of element box
- For left-side labels:
- Connects from start of label to left edge of element box
-
Visual Elements
- Rounded rectangles (6px radius) around detected UI elements
- 4px padding around detected element bounds
- White background behind labels to ensure readability
- Text padding of 5px around labels
- Consistent font family (Helvetica Neue/Arial) with defined sizes:
- Numbers: 16px
- Text: 15px
- Title: 20px
This approach provides:
- Clean, consistent layout
- Clear visual connection between labels and elements
- Optimal space utilization
- Improved readability through careful spacing and typography
- Graceful handling of elements without detected bounds
-
Parallel Processing
- Concurrent API calls for faster processing
- Batch processing for multiple images
-
UI Improvements
- Interactive web interface
- Real-time preview
- Custom annotation styles
-
Enhanced Analysis
- Context-aware translations
- UI element type detection
- Alternative translation services
-
Output Formats
- PDF export
- HTML with interactive features
- Translation memory export
Core:
- click: CLI interface
- Pillow: Image processing
- anthropic: Claude API
- openai: OpenAI Vision API
- python-dotenv: Configuration
Optional (OCR):
- pytesseract
- easyocr
- paddleocr
- paddlepaddle
The system uses environment variables for API keys:
ANTHROPIC_API_KEY: For Claude Vision APIOPENAI_API_KEY: For OpenAI Vision API
These can be set in the environment or in a .env file.