Skip to content

New action: AI translation backfill for untranslated strings #737

Description

@jkmassel

Summary

Add an action that fills still-untranslated localization keys with machine (LLM) translations so apps never ship untranslated strings, validates each output preserves the source's placeholders, and writes results back under a clear marker. Human translations replace them on the next sync; AI output is never uploaded to GlotPress.

Filed for the WordPress-iOS continuous-localization work ("Faster Releases" RFC). Reference implementation: wordpress-mobile/WordPress-iOS#25675fastlane/helpers/ai_translator.rb + the backfill_missing_translations lane. Depends on #736 (placeholder primitive). WPiOS will adopt the toolkit version once it lands and delete the project-side copy.

Motivation

In the continuous-localization model, translations are downloaded daily and ship as-is. Between human-translation passes, locales have gaps — without a backstop, non-English locales ship English (or the raw key). Backfilling the gaps with an LLM as a clearly-marked, never-uploaded stopgap means the app never shows an untranslated string; human translations from GlotPress overwrite the AI entries on the next sync.

Behavior

  1. For each target locale, compute missing keys = base-language keys absent from the locale file.
  2. Batch missing (key → source value) and send to the LLM with strict rules:
    • Preserve every format specifier exactly — same count, order, and positional indices (%@, %1$d, %%).
    • Preserve leading/trailing whitespace and punctuation; concise and natural for a mobile UI.
    • Return only a JSON object key → translation (no prose/markdown/fences).
  3. Validate each output with the placeholder primitive from New action: localization placeholder-compatibility check (catch placeholder-breaking changes to existing keys) #736: if the translation's placeholder signature ≠ the source's, drop it (leave the key untranslated) and log — never ship a placeholder-broken translation.
  4. Write validated translations into the locale file under a marker comment (e.g. /* Machine-translated — pending human translation */), with correct file-format escaping.
  5. Resilient: a failed batch/locale must not crash the job (best-effort); honor provider retry (429/5xx).

Provider / model

WPiOS uses Claude claude-sonnet-4-6 (to match the model already used by its CI Claude integration). The toolkit currently only ships openai_ask (OpenAI, gpt-4.1). Design decision needed:

  • (a) Add an Anthropic-backed action (matches WPiOS), or
  • (b) a provider-agnostic action with pluggable backends (OpenAI via existing openai_ask conventions + Anthropic), or
  • (c) extend openai_ask.

Recommendation: provider-agnostic core with an Anthropic adapter first, mirroring openai_ask's conventions:

  • ANTHROPIC_API_TOKEN env, sensitive: true ConfigItem (cf. OPENAI_API_TOKEN).
  • Anthropic uses https://api.anthropic.com/v1/messages, headers x-api-key + anthropic-version: 2023-06-01 (not Authorization: Bearer), body { model, max_tokens, messages: [{ role: "user", content }] }, response content[].text where type == "text". Use exact model-id strings (no date suffix). Keep output per call under ~16K tokens (batch accordingly) to avoid non-streaming timeouts.
  • SDK vs raw HTTP: WPiOS used the official anthropic gem; openai_ask uses raw Net::HTTP (no gem dep). Recommend raw Net::HTTP for the toolkit to avoid a heavy dependency and match openai_ask. (Open question.)

Proposed options (ConfigItems)

Option Notes
base_strings / locale_strings Hashes or file paths.
target_locales Locale codes + a code → human-language-name map for the prompt.
model Default the project's choice (claude-sonnet-4-6).
api_token env_name: ANTHROPIC_API_TOKEN, sensitive: true.
batch_size, max_tokens Batching + per-call ceiling (WPiOS: 40 / 8192).
marker_comment Text marking machine translations in the file.
dry_run Return translations without writing (previews/tests).

Return value: per-locale translations applied + count dropped for placeholder mismatch.

Language-name map

The prompt needs locale-code → human language name (e.g. pt-BR → "Brazilian Portuguese"). WPiOS hardcodes a LANGUAGE_NAMES map. The toolkit already plans a LocaleHelper (see #296 and the TODO in WPiOS localization.rb) — this action should consume that rather than re-hardcode.

Dependency

Hard dependency on #736 — the per-string placeholders_compatible?(source, translation) check is what makes machine translation safe to ship.

Cross-platform

The translate + validate core is platform-agnostic (operates on { key => value }). File read/write/escape is platform-specific (iOS .strings, Android strings.xml). Recommend common/ core + ios_*/android_* wrappers. Complements the existing android_download_translations / ios_download_strings_files_from_glotpress (which fetch human translations).

Edge cases

  • Empty/whitespace source values → skip.
  • "Overloaded" keys where the key is the English text → changing English yields a new key, so backfill only fills truly missing keys (no special-casing needed).
  • Large missing sets → chunk; mind cost (below).
  • Regional English locales (en-GB/en-AU/en-CA) → still "translate" (spelling/locale conventions).
  • Model returns prose/code-fenced JSON despite instructions → extract the JSON object defensively.
  • Marker idempotency: the next human-translation download overwrites the locale file, dropping the AI section, then re-backfill — so the AI section is regenerated each run (see Cost).
  • .stringsdict / plurals → open question (likely v2).

Cost / efficiency

The naive daily flow re-translates all still-missing strings every run (no caching). For ~30 locales this is real recurring cost. Consider: only translating keys new since last run, caching AI outputs keyed by (source, locale), or a budget guard. (WPiOS accepted the naive cost initially.) Document the tradeoff; make batching/throttling configurable.

Testing

  • RSpec with a mocked LLM client (never hit the API in tests). Assert: missing-key detection; prompt construction; JSON extraction (incl. fenced/prose-wrapped); placeholder-drop behavior; file write/escape round-trip.

Acceptance criteria

Open questions

  • Provider strategy (Anthropic vs provider-agnostic vs extend openai_ask).
  • SDK vs raw Net::HTTP.
  • LocaleHelper integration (#296) for language names.
  • Cost controls / caching / "only new keys".
  • .stringsdict / Android plurals.
  • Android parity timing.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions