New action: AI translation backfill for untranslated strings

## Summary

Add an action that **fills still-untranslated localization keys with machine (LLM) translations** so apps never ship untranslated strings, **validates each output preserves the source's placeholders**, and writes results back under a clear marker. Human translations replace them on the next sync; **AI output is never uploaded to GlotPress**.

> Filed for the WordPress-iOS continuous-localization work ("Faster Releases" RFC). Reference implementation: wordpress-mobile/WordPress-iOS#25675 — `fastlane/helpers/ai_translator.rb` + the `backfill_missing_translations` lane. **Depends on** wordpress-mobile/release-toolkit#736 (placeholder primitive). WPiOS will adopt the toolkit version once it lands and delete the project-side copy.

## Motivation

In the continuous-localization model, translations are downloaded daily and ship as-is. Between human-translation passes, locales have gaps — without a backstop, non-English locales ship **English** (or the raw key). Backfilling the gaps with an LLM as a **clearly-marked, never-uploaded stopgap** means the app never shows an untranslated string; human translations from GlotPress overwrite the AI entries on the next sync.

## Behavior

1. For each target locale, compute **missing keys** = base-language keys absent from the locale file.
2. **Batch** missing `(key → source value)` and send to the LLM with strict rules:
   * Preserve **every** format specifier exactly — same count, order, and positional indices (`%@`, `%1$d`, `%%`).
   * Preserve leading/trailing whitespace and punctuation; concise and natural for a mobile UI.
   * Return **only** a JSON object `key → translation` (no prose/markdown/fences).
3. **Validate** each output with the placeholder primitive from wordpress-mobile/release-toolkit#736: if the translation's placeholder signature ≠ the source's, **drop it** (leave the key untranslated) and log — never ship a placeholder-broken translation.
4. **Write** validated translations into the locale file under a marker comment (e.g. `/* Machine-translated — pending human translation */`), with correct file-format escaping.
5. **Resilient**: a failed batch/locale must not crash the job (best-effort); honor provider retry (429/5xx).

## Provider / model

WPiOS uses **Claude** `claude-sonnet-4-6` (to match the model already used by its CI Claude integration). The toolkit currently only ships `openai_ask` (OpenAI, `gpt-4.1`). **Design decision needed:**

* (a) Add an **Anthropic-backed** action (matches WPiOS), or
* (b) a **provider-agnostic** action with pluggable backends (OpenAI via existing `openai_ask` conventions + Anthropic), or
* (c) extend `openai_ask`.

**Recommendation:** provider-agnostic core with an **Anthropic adapter first**, mirroring `openai_ask`'s conventions:

* `ANTHROPIC_API_TOKEN` env, `sensitive: true` ConfigItem (cf. `OPENAI_API_TOKEN`).
* Anthropic uses `https://api.anthropic.com/v1/messages`, headers `x-api-key` + `anthropic-version: 2023-06-01` (not `Authorization: Bearer`), body `{ model, max_tokens, messages: [{ role: "user", content }] }`, response `content[].text` where `type == "text"`. Use exact model-id strings (no date suffix). Keep output per call under \~16K tokens (batch accordingly) to avoid non-streaming timeouts.
* **SDK vs raw HTTP:** WPiOS used the official `anthropic` gem; `openai_ask` uses raw `Net::HTTP` (no gem dep). **Recommend raw** `Net::HTTP` for the toolkit to avoid a heavy dependency and match `openai_ask`. (Open question.)

## Proposed options (ConfigItems)


| Option | Notes |
| -- | -- |
| `base_strings` / `locale_strings` | Hashes or file paths. |
| `target_locales` | Locale codes + a code → human-language-name map for the prompt. |
| `model` | Default the project's choice (`claude-sonnet-4-6`). |
| `api_token` | `env_name: ANTHROPIC_API_TOKEN`, `sensitive: true`. |
| `batch_size`, `max_tokens` | Batching + per-call ceiling (WPiOS: 40 / 8192). |
| `marker_comment` | Text marking machine translations in the file. |
| `dry_run` | Return translations without writing (previews/tests). |

Return value: per-locale translations applied + count dropped for placeholder mismatch.

### Language-name map

The prompt needs locale-code → human language name (e.g. `pt-BR` → "Brazilian Portuguese"). WPiOS hardcodes a `LANGUAGE_NAMES` map. The toolkit already plans a `LocaleHelper` (see [#296](<https://github.com/wordpress-mobile/release-toolkit/issues/296>) and the TODO in WPiOS `localization.rb`) — this action should **consume that** rather than re-hardcode.

## Dependency

**Hard dependency on** wordpress-mobile/release-toolkit#736 — the per-string `placeholders_compatible?(source, translation)` check is what makes machine translation safe to ship.

## Cross-platform

The translate + validate **core is platform-agnostic** (operates on `{ key => value }`). File read/write/escape is platform-specific (iOS `.strings`, Android `strings.xml`). Recommend `common/` core + `ios_*`/`android_*` wrappers. Complements the existing `android_download_translations` / `ios_download_strings_files_from_glotpress` (which fetch *human* translations).

## Edge cases

* Empty/whitespace source values → skip.
* "Overloaded" keys where the key **is** the English text → changing English yields a new key, so backfill only fills *truly missing* keys (no special-casing needed).
* Large missing sets → chunk; mind **cost** (below).
* Regional English locales (`en-GB`/`en-AU`/`en-CA`) → still "translate" (spelling/locale conventions).
* Model returns prose/code-fenced JSON despite instructions → extract the JSON object defensively.
* Marker idempotency: the next human-translation download **overwrites** the locale file, dropping the AI section, then re-backfill — so the AI section is regenerated each run (see Cost).
* `.stringsdict` / plurals → **open question** (likely v2).

## Cost / efficiency

The naive daily flow **re-translates all still-missing strings every run** (no caching). For \~30 locales this is real recurring cost. Consider: only translating keys *new since last run*, caching AI outputs keyed by `(source, locale)`, or a budget guard. (WPiOS accepted the naive cost initially.) Document the tradeoff; make batching/throttling configurable.

## Testing

* RSpec with a **mocked LLM client** (never hit the API in tests). Assert: missing-key detection; prompt construction; JSON extraction (incl. fenced/prose-wrapped); placeholder-drop behavior; file write/escape round-trip.

## Acceptance criteria

- [ ] Action + helper + provider adapter (Anthropic first), mirroring `openai_ask` conventions.
- [ ] Per-output placeholder validation via wordpress-mobile/release-toolkit#736; drops + logs mismatches.
- [ ] Writes under a marker; correct file-format escaping; `dry_run`.
- [ ] RSpec (mocked provider), CHANGELOG entry, `actions/README.md` entry.
- [ ] Never uploads AI output to GlotPress.

## Open questions

* Provider strategy (Anthropic vs provider-agnostic vs extend `openai_ask`).
* SDK vs raw `Net::HTTP`.
* `LocaleHelper` integration ([#296](<https://github.com/wordpress-mobile/release-toolkit/issues/296>)) for language names.
* Cost controls / caching / "only new keys".
* `.stringsdict` / Android plurals.
* Android parity timing.

Option	Notes
`base_strings` / `locale_strings`	Hashes or file paths.
`target_locales`	Locale codes + a code → human-language-name map for the prompt.
`model`	Default the project's choice (`claude-sonnet-4-6`).
`api_token`	`env_name: ANTHROPIC_API_TOKEN`, `sensitive: true`.
`batch_size`, `max_tokens`	Batching + per-call ceiling (WPiOS: 40 / 8192).
`marker_comment`	Text marking machine translations in the file.
`dry_run`	Return translations without writing (previews/tests).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

New action: AI translation backfill for untranslated strings #737

Summary

Motivation

Behavior

Provider / model

Proposed options (ConfigItems)

Language-name map

Dependency

Cross-platform

Edge cases

Cost / efficiency

Testing

Acceptance criteria

Open questions

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

New action: AI translation backfill for untranslated strings #737

Description

Summary

Motivation

Behavior

Provider / model

Proposed options (ConfigItems)

Language-name map

Dependency

Cross-platform

Edge cases

Cost / efficiency

Testing

Acceptance criteria

Open questions

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions