Skip to content

BenjaminKobjolke/GPT-json-translator

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

76 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GPT JSON Translator

A powerful tool for translating JSON and Android XML files to multiple languages using OpenAI's GPT models.

Overview

GPT JSON Translator is a Python script that automates the translation of JSON and Android XML string files to multiple languages. It uses OpenAI's GPT models to provide high-quality translations while preserving the structure of your files. The tool is particularly useful for localizing applications, websites, or any content stored in JSON or Android XML format.

Features

  • Translates JSON and Android XML files to multiple languages simultaneously
  • Preserves file structure (only translates values, not keys)
  • Supports 40+ languages out of the box
  • Dual-language mode - use two source languages for improved translation quality
  • Recursive batch translation across directory hierarchies
  • Override languages via CLI - use --languages to override settings.ini
  • Exclude specific languages from translation via command-line flag
  • Handles existing translations (only translates new or changed content)
  • Supports translation overrides for specific terms
  • Provides global and field-specific translation hints
  • Supports standard JSON, Flutter ARB, and Android XML file formats
  • Android XML: Automatically excludes translatable="false" elements
  • HTML/Twig extraction - extract translatable text from templates and generate JSON

Requirements

  • Python 3.6+
  • OpenAI API key

Installation

  1. Clone this repository:

    git clone https://github.com/yourusername/GPT-json-translator.git
    cd GPT-json-translator
    
  2. Install the required dependencies:

    pip install -r requirements.txt
    
  3. Create a settings.ini file with your configuration:

    [General]
    api_key = your-openai-api-key
    source_path = ./locales/en.json
    model = gpt-4o-mini
    
    [Languages]
    languages = it-IT, fr-FR, es-ES, de-DE

    A template file settings_example.ini is provided for reference.

Usage

Basic Usage

Run the script with a path to your source JSON file:

python json_translator.py path/to/your/source.json

If no path is provided, the script will use the default path specified in settings.ini or prompt you to enter a path.

Excluding Languages

You can exclude specific languages from translation using the --exclude-languages (or --exclude) flag. This is useful when you want to translate to most languages but skip a few:

python json_translator.py path/to/source.json --exclude-languages="he,ko"

Or use the shorter alias:

python json_translator.py path/to/source.json --exclude="he,ko,ar"

Features:

  • Accepts comma-separated language codes
  • Works with both short codes (he, ko) and full codes (he-IL, ko-KR)
  • Can exclude multiple languages at once
  • Useful when translating to all languages by default (when languages is commented out in settings.ini)

Example:

# Translate to all languages except Hebrew and Korean
python json_translator.py ./locales/en.json --exclude="he,ko"

Specifying Target Languages

You can override the settings.ini languages configuration using the --languages flag. This is useful when you want to translate to specific languages without modifying your configuration file:

# Translate to German only
python json_translator.py path/to/source.json --languages="de-DE"

# Translate to multiple specific languages
python json_translator.py path/to/source.json --languages="de-DE,fr-FR,es-ES"

Features:

  • Accepts comma-separated language codes
  • Completely overrides the settings.ini languages setting
  • Works with both short codes (de) and full codes (de-DE)
  • Can be combined with --exclude (exclusion is applied after)

Combining with --exclude:

# Start with 3 languages, then exclude French
python json_translator.py source.json --languages="de-DE,fr-FR,es-ES" --exclude="fr"
# Result: translates to de-DE and es-ES only

Works with recursive mode:

python json_translator.py "D:\release-notes\" --translate-recursive="en.json" --languages="de-DE,fr-FR"

Dual-Language Mode

The --second-input flag enables dual-language translation, where the AI receives both your primary source (e.g., English) and a second language (e.g., German) to produce better translations for other languages. This is particularly useful when you have a high-quality human translation in one language that can help inform translations to other languages.

python json_translator.py path/to/en.json --second-input="path/to/de.json"

How it works:

  • For each key, the AI receives both the original value and the second language translation
  • The AI uses both sources to produce more accurate and natural translations
  • The second language file is automatically excluded from translation targets (won't be overwritten)
  • If a key exists in the primary source but not in the second input, a warning is printed and the key is translated from the primary source only

Example use case:

You have English source and a professionally translated German file:

// en.json
{"greeting": "Move, copy, share or delete multiple files at once"}

// de.json (your high-quality German translation)
{"greeting": "Verschiebe, kopiere, teile oder lösche mehrere Dateien gleichzeitig"}

When translating to French, the AI sees both:

{"greeting": "Move, copy, share or delete multiple files at once", "greeting_de": "Verschiebe, kopiere, teile oder lösche mehrere Dateien gleichzeitig"}

This dual-source approach helps the AI understand nuance and context, producing better translations.

Combining with other flags:

# Dual-language mode with language exclusions
python json_translator.py en.json --second-input="de.json" --exclude="he,ko"

# Works with recursive mode - same second input used for all directories
python json_translator.py "D:\release-notes\" --translate-recursive="en.json" --second-input="D:\translations\de.json"

Works with ARB files too:

python json_translator.py app_en.arb --second-input="app_de.arb"

Recursive Translation

The --translate-recursive flag enables batch processing of multiple directories that contain the same source filename. This is particularly useful when you have a hierarchical directory structure where each subdirectory needs its own set of translations.

python json_translator.py "path/to/base/directory" --translate-recursive="en.json"

How it works:

  • Recursively searches all subdirectories of the specified base directory
  • Finds all directories containing the specified source file (e.g., en.json)
  • Filters to only directories that have no translation files (only the source file exists)
  • Translates each qualifying directory independently

Example use case:

If you have release notes organized by version:

release-notes/
├── 257/
│   └── en.json           # Has translations - SKIPPED
│       de.json
│       fr.json
├── 258/
│   └── en.json           # Has translations - SKIPPED
│       de.json
│       fr.json
├── 259/
│   └── en.json           # No translations - TRANSLATED

Run the recursive translation:

python json_translator.py "D:\project\release-notes\" --translate-recursive="en.json"

This will:

  1. Search all subdirectories (257, 258, 259, etc.)
  2. Find those containing en.json
  3. Only translate folders where only en.json exists (no de.json, fr.json, etc.)
  4. Process each qualifying directory, creating all configured language translations

Features:

  • Automatically detects file type (JSON or ARB) from the source filename
  • Skips directories that already have translations (idempotent)
  • Processes each directory independently with full translation workflow
  • Supports all translation features (hints, overrides, language exclusions)
  • Shows progress for each directory being processed

Force mode:

By default, recursive mode skips directories that already have translation files. Use --force to process all directories containing the source file, even those with existing translations. This is useful for incrementally adding new keys across all directories:

# Process all directories, even those with existing translations
python json_translator.py "D:\project\release-notes\" --translate-recursive="en.json" --force

Since the translator only sends missing keys to the API, this is safe and efficient — directories with complete translations will simply be skipped with no API calls.

Combining with other flags:

You can combine recursive mode with language exclusions and force mode:

# Recursively translate all subdirectories, excluding Hebrew and Korean
python json_translator.py "D:\project\release-notes\" --translate-recursive="en.json" --exclude="he,ko"

# Force mode with exclusions
python json_translator.py "D:\project\release-notes\" --translate-recursive="en.json" --force --exclude="he,ko"

Works with ARB files too:

python json_translator.py "D:\flutter\lib\l10n\" --translate-recursive="app_en.arb"

Android XML Translation

The tool supports Android strings.xml files with automatic language-specific directory output following Android conventions.

Basic Usage

python json_translator.py "path/to/res/values/strings.xml"

How it works

  • Input: Source file in res/values/strings.xml
  • Output: Translated files in language-specific directories:
    • res/values-de/strings.xml (German)
    • res/values-fr/strings.xml (French)
    • res/values-es/strings.xml (Spanish)
    • etc.

Supported Elements

The tool translates the following Android XML elements:

  • <string name="key">value</string> - Simple strings
  • <string-array name="key"> - String arrays with multiple <item> elements
  • <plurals name="key"> - Plural strings with quantity variations

Quote Handling

By default, quotes in translated strings are escaped with backslashes (Android convention):

<string name="welcome">Hello \"World\" and it\'s great</string>

If you prefer CDATA sections instead, use the --use-cdata flag:

python json_translator.py "path/to/res/values/strings.xml" --use-cdata

This produces:

<string name="welcome"><![CDATA[Hello "World" and it's great]]></string>

When to use each approach:

  • Default (escaped quotes): Standard Android convention, works everywhere
  • CDATA sections: Useful when strings contain complex HTML or many special characters

Non-translatable Elements

Elements marked with translatable="false" are completely excluded from output files:

<!-- Source: values/strings.xml -->
<resources>
    <string name="app_name" translatable="false">MyApp</string>
    <string name="welcome">Welcome!</string>
</resources>

<!-- Output: values-de/strings.xml -->
<resources>
    <string name="welcome">Willkommen!</string>
</resources>

The app_name element is excluded because it has translatable="false".

Directory Structure

android/app/src/main/res/
├── values/
│   ├── strings.xml           # Source file (default language)
│   └── _overrides/           # Optional overrides
│       └── values-de/
│           └── strings.xml   # German overrides
├── values-de/
│   └── strings.xml           # German translation (auto-generated)
├── values-fr/
│   └── strings.xml           # French translation (auto-generated)
└── values-es/
    └── strings.xml           # Spanish translation (auto-generated)

Incremental Translation

Like JSON files, Android XML translations are incremental:

  • Existing translations in values-{lang}/strings.xml are preserved
  • Only new or missing strings are translated
  • Override files take precedence over both existing and new translations

Example with Language Exclusions

# Translate to all languages except Hebrew, Korean, and Arabic
python json_translator.py "D:\project\android\app\src\main\res\values\strings.xml" --exclude="he,ko,ar"

Applying Overrides Only

You can apply override files to translation files without performing any translation using the --apply-overrides flag. This is useful when you want to bulk-update translation files with override values without triggering API calls or re-translation.

python json_translator.py path/to/source.json --apply-overrides

How it works:

  • Scans the _overrides/ directory for all available override files
  • Discovers override files automatically (works with both JSON and ARB formats)
  • Applies each override to its corresponding translation file
  • Creates new translation files if they don't exist yet
  • Merges overrides with existing translations (overrides take precedence)

Features:

  • No API calls or translation - pure file merging operation
  • Automatically detects file type (JSON or ARB) from the source file
  • Processes all override files found, regardless of config settings
  • Creates missing translation files from override content
  • Shows summary of applied overrides for each language

Example:

# Apply all overrides in _overrides/ to translation files
python json_translator.py ./locales/en.json --apply-overrides

# For ARB files
python json_translator.py ./lib/l10n/app_en.arb --apply-overrides

Use cases:

  • Bulk update specific terms across all translations
  • Initialize new translation files with predefined values
  • Apply terminology changes without re-translating
  • Sync override changes to all language files quickly

HTML/Twig Text Extraction

The --extract-html flag enables extraction of translatable text from HTML or Twig template files. This is useful when you have existing templates with hardcoded text that needs to be internationalized.

python json_translator.py --extract-html "path/to/template.twig" --output "path/to/en.json"

How it works:

  • Scans HTML/Twig files for translatable text content and attributes
  • Generates translation keys based on filename and element type (e.g., overview.h2_1, overview.p_1)
  • Creates or updates a JSON translation file with extracted strings
  • Replaces original text in template files with {{ t('key') }} function calls
  • Creates .bak backup files before modifying templates

Extracted elements:

  • Text content from: h1-h6, p, span, button, label, a, li, th, td, figcaption, legend, option
  • Attributes: alt, title, placeholder, aria-label, aria-description

Command-line options:

Flag Description
--extract-html PATH Path to HTML/Twig file(s). Supports glob patterns (e.g., "templates/*.twig")
--output PATH / -o Output JSON file path (required)
--translation-function FUNC Name of the Twig translation function (default: t)
--no-backup Skip creating .bak backup files
--dry-run Preview changes without modifying any files

Examples:

# Extract from a single file
python json_translator.py --extract-html "templates/overview.twig" --output "lang/en.json"

# Extract from multiple files using glob pattern
python json_translator.py --extract-html "templates/*.twig" --output "lang/en.json"

# Preview extraction without making changes
python json_translator.py --extract-html "templates/overview.twig" --output "lang/en.json" --dry-run

# Use a custom translation function name
python json_translator.py --extract-html "templates/overview.twig" --output "lang/en.json" --translation-function trans

# Skip backup file creation
python json_translator.py --extract-html "templates/overview.twig" --output "lang/en.json" --no-backup

Example transformation:

Before (overview.twig):

<section>
    <h2>Overview</h2>
    <p>Welcome to our application.</p>
    <p><b>Bold text</b> with emphasis.</p>
    <img src="image.png" alt="App screenshot">
</section>

After extraction (overview.twig):

<section>
    <h2>{{ t('overview.h2_1') }}</h2>
    <p>{{ t('overview.p_1') }}</p>
    <p>{{ t('overview.p_2')|raw }}</p>
    <img src="image.png" alt="{{ t('overview.alt_1') }}">
</section>

Generated JSON (en.json):

{
    "overview": {
        "h2_1": "Overview",
        "p_1": "Welcome to our application.",
        "p_2": "<b>Bold text</b> with emphasis.",
        "alt_1": "App screenshot"
    }
}

Features:

  • Inline HTML preserved: Content with inline tags (<b>, <i>, <br/>, etc.) is preserved and the |raw filter is automatically added
  • Already-translated content skipped: Text containing {{ t('...) patterns is not extracted
  • Incremental extraction: Merges with existing JSON files without overwriting existing keys
  • Backup files: Creates .bak files before modifying templates (disable with --no-backup)

JSON Attribute Remover Utility

The json_attribute_remover.py utility removes specified attributes from JSON translation files. It supports two modes: directory mode and file mode, automatically detecting which mode to use based on the input path.

Interactive Mode (New!)

When called without an attributes file, the tool enters interactive mode with arrow-key navigation:

# Interactive mode - select attribute to remove visually
python json_attribute_remover.py path/to/en.json

# Or with a directory (defaults to scanning en.json)
python json_attribute_remover.py path/to/locales/

Features:

  • Displays all attributes with their values from the source file
  • Navigate with ↑/↓ arrow keys
  • Press Enter to select, Esc/Ctrl+C to cancel
  • Shows confirmation prompt before removing
  • Nested keys displayed with dot notation (e.g., settings.theme)

Example output:

? Select attribute to remove:
> app_name: "My Application"
  description: "A great app for productivity"
  settings: "{...}"
  settings.theme: "dark"
  settings.language: "en"

Requirements:

pip install questionary

Directory Mode

Removes attributes from all JSON files in a directory, automatically excluding en.json (and optionally other source files).

# Basic usage - always excludes en.json
python json_attribute_remover.py path/to/directory path/to/attributes_to_remove.json

# Exclude additional source files (e.g., ARB files)
python json_attribute_remover.py path/to/directory attributes.json --exclude-source="app_en.arb"

Features:

  • Always excludes en.json by default (no configuration needed)
  • Optionally exclude additional source files via --exclude-source flag
  • Processes all other JSON files in the directory
  • Useful for bulk cleanup of translation files

Example:

# Removes attributes from de.json, fr.json, etc., but NOT from en.json
python json_attribute_remover.py ./locales attributes_to_remove.json

File Mode

Removes attributes from all JSON files in the directory EXCEPT the specified file. This mode is useful when you want to keep one specific translation file unchanged while updating all others.

# Specify a file to exclude - processes all other files including en.json
python json_attribute_remover.py path/to/directory/de.json path/to/attributes_to_remove.json

Features:

  • Only excludes the specified file from processing
  • Processes ALL other JSON files, including source files like en.json
  • Useful when you want to preserve one specific translation
  • Automatically detects the directory from the file path

Example:

# Removes attributes from en.json, fr.json, etc., but NOT from de.json
python json_attribute_remover.py ./locales/de.json attributes_to_remove.json

Attributes File Format

The tool supports two formats for specifying attributes to remove:

1. Simple List Format (top-level keys only):

[
  "obsolete_key",
  "deprecated_field",
  "old_translation"
]

2. Nested Object Format (supports nested structures):

{
  "topLevelKey": true,
  "viewSettings": {
    "imageViewer": true,
    "deprecatedOption": true
  },
  "removeAllNested": "*"
}

Nested format features:

  • true value: Removes the specific key
  • "*" value: Removes all keys under the parent (wildcard)
  • Nested objects: Mirror your JSON structure to target nested keys
  • Automatic cleanup: Empty parent objects are removed automatically

Examples:

Remove a nested attribute:

{
  "viewSettings": {
    "imageViewer": true
  }
}

Remove all keys under a parent:

{
  "viewSettings": "*"
}

Mix top-level and nested removals:

{
  "topLevelKey": true,
  "nested": {
    "childKey": true
  }
}

Common Features

Both modes provide:

  • Only modifies files that contain the specified attributes
  • Preserves JSON formatting with proper indentation
  • Shows progress and summary statistics
  • Handles errors gracefully (one file failure doesn't stop others)

Getting Help

python json_attribute_remover.py --help

Windows Path Issue

IMPORTANT: On Windows, when using paths with spaces in batch files or cmd.exe, avoid trailing backslashes in quoted paths as they can escape the closing quote.

Problem example (causes error):

python json_attribute_remover.py "D:\path\to\directory\" "C:\path with spaces\file.json"

The trailing \ before the closing " escapes the quote, causing argument parsing to fail.

Solutions:

  1. Remove the trailing backslash (recommended):

    python json_attribute_remover.py "D:\path\to\directory" "C:\path with spaces\file.json"
  2. Double the trailing backslash:

    python json_attribute_remover.py "D:\path\to\directory\\" "C:\path with spaces\file.json"
  3. Use forward slashes (Windows accepts both):

    python json_attribute_remover.py "D:/path/to/directory" "C:/path with spaces/file.json"

Directory Structure

The script expects the following directory structure:

project/
├── locales/
│   ├── en.json           # Source file (English)
│   ├── de.json           # German translation
│   ├── fr.json           # French translation
│   └── ...               # Other language files
│   └── _overrides/       # Translation overrides
│       ├── de.json       # German overrides
│       └── ...           # Other language overrides

Translation Hints

You can provide translation hints to guide the AI translator. Hints are automatically excluded from the translated output files.

Global Hints

Global hints apply to all fields in your translation. Use keys that start and end with an underscore (e.g., _hint_). You can use multiple global hints by numbering them:

{
    "_hint_": "All texts are for a file explorer app. Translations should fit this context.",
    "_hint_2_": "Media File Explorer is the app name and should not be translated.",
    "_hint_3_": "If the language has a formal and informal way, use the informal way.",
    "title": "Welcome to Media File Explorer",
    "description": "Manage your files with ease"
}

Field-Specific Hints

Field-specific hints provide targeted guidance for individual fields. Use the pattern _hint_fieldname:

{
    "_hint_": "SUMMERA AI is a proper name and should not be translated",
    "short_description": "Explorer with editing, favorites & smart file management",
    "_hint_short_description": "Maximum length is 60 characters, shorten if too long by not adhering 100% to the original language",
    "app_name": "File Explorer Pro",
    "welcome_message": "Welcome to our application!"
}

When translating, the AI receives all hints:

Translation hints:
- All texts are for a file explorer app. Translations should fit this context.
- Media File Explorer is the app name and should not be translated.
- If the language has a formal and informal way, use the informal way.

Field-specific hints:
- short_description: Maximum length is 60 characters, shorten if too long by not adhering 100% to the original language

Supported global hint patterns:

  • _hint_ - Primary hint
  • _hint_2_, _hint_3_, etc. - Additional numbered hints
  • Any key matching _hint_*_ pattern (starts with _hint_ and ends with _)

Use cases for hints:

  • Proper names that should remain untranslated
  • Brand names or product names
  • Technical terms with specific translations
  • Context information for ambiguous terms
  • Length constraints for specific fields
  • Tone or formality requirements (formal vs informal)

Translation Overrides

You can create override files for specific languages to ensure certain terms are always translated consistently. Place these files in the _overrides directory with the language code as the filename.

Example (_overrides/de.json):

{
    "app_name": "MeineApp",
    "special_term": "SpezialBegriff"
}

Project Structure

The project is organized into a modular structure:

GPT-json-translator/
├── json_translator.py        # Main entry point script
├── json_attribute_remover.py # Utility to remove attributes from JSON files
├── settings.ini              # Configuration file
├── settings_example.ini      # Example configuration template
├── src/                      # Source code directory
│   ├── __init__.py           # Package initialization
│   ├── main.py               # Main entry point
│   ├── config.py             # Configuration manager
│   ├── translator.py         # Translation service
│   ├── file_handler.py       # File I/O operations
│   ├── models/               # Data models
│   │   ├── __init__.py
│   │   └── translation_data.py
│   ├── services/             # Business logic services
│   │   ├── translation_orchestrator.py
│   │   ├── override_service.py
│   │   └── recursive_translator.py
│   └── utils/                # Utility functions
│       ├── __init__.py
│       ├── path_utils.py     # Path and filename utilities
│       ├── language_utils.py # Language code handling
│       ├── dict_utils.py     # Deep merge/diff utilities
│       └── xml_handler.py    # Android XML file handling
├── locales/                  # Translation files directory
│   ├── en.json               # Source file (English)
│   ├── de.json               # German translation
│   └── _overrides/           # Translation overrides
│       └── de.json           # German overrides
└── requirements.txt          # Dependencies

Supported Languages

The script supports translation to the following languages:

  • Italian (it-IT)
  • French (fr-FR)
  • Spanish (es-ES)
  • German (de-DE)
  • Portuguese (pt-PT, pt-BR)
  • Dutch (nl-NL)
  • Russian (ru-RU)
  • Polish (pl-PL)
  • Turkish (tr-TR)
  • Chinese (zh-CN)
  • Japanese (ja-JP)
  • Korean (ko-KR)
  • Arabic (ar-AR)
  • Hindi (hi-IN)
  • Swedish (sv-SE)
  • Norwegian (no-NO)
  • Finnish (fi-FI)
  • Danish (da-DK)
  • Czech (cs-CZ)
  • Slovak (sk-SK)
  • Hungarian (hu-HU)
  • Romanian (ro-RO)
  • Ukrainian (uk-UA)
  • Bulgarian (bg-BG)
  • Croatian (hr-HR)
  • Serbian (sr-SP)
  • Slovenian (sl-SI)
  • Estonian (et-EE)
  • Latvian (lv-LV)
  • Lithuanian (lt-LT)
  • Hebrew (he-IL)
  • Persian (fa-IR)
  • Urdu (ur-PK)
  • Bengali (bn-IN)
  • Tamil (ta-IN)
  • Telugu (te-IN)
  • Marathi (mr-IN)
  • Malayalam (ml-IN)
  • Thai (th-TH)
  • Vietnamese (vi-VN)

Advanced Configuration

You can customize the behavior by modifying the following settings in your settings.ini file:

Target Languages

In the [Languages] section, specify the languages you want to translate to:

[Languages]
languages = it-IT, fr-FR, es-ES, de-DE

If you comment out or omit the languages setting, the script will translate to all 40+ supported languages by default:

[Languages]
# Uncomment and modify the languages you want to translate to, otherwise all will be translated
#languages = de-DE

You can also use the --exclude-languages command-line flag to exclude specific languages from translation without modifying your configuration file (see Excluding Languages section).

OpenAI Model

In the [General] section, specify which OpenAI model to use for translations:

[General]
model = gpt-4o-mini

Other options include "gpt-4o", "gpt-4", "gpt-3.5-turbo", etc. Different models offer different trade-offs between translation quality, speed, and cost. The default is gpt-4o-mini, which provides a good balance of quality and performance.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Credits

Original code by Leonardo Rignanese (twitter.com/leorigna) Refactored structure by [Your Name]

About

This Python script uses the OpenAI API to translate a JSON file into multiple target languages. It prompts the user for the source file and target languages, and outputs the translations as separate JSON files.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 96.4%
  • Batchfile 3.6%