Summary
Add glossary (용어집) support so that specific terms are consistently translated according to user-defined mappings. Currently, the translator relies entirely on the LLM's judgment for term translation, which can lead to inconsistent or incorrect translations for domain-specific terms, proper nouns, or preferred expressions.
Motivation
- Certain terms (e.g., brand names, technical jargon, community-specific expressions) should always be translated in a specific way
- LLMs may translate the same term differently across posts, leading to inconsistency
- Some terms should be kept untranslated (e.g., proper nouns)
Key Questions to Discuss
- Storage format: Should the glossary be defined in
config.toml, a separate file (e.g., glossary.toml/glossary.json), or both?
- Scope: Should glossary entries be global or per target language?
- Global:
"블루스카이" → "Bluesky" (same for all languages)
- Per-language:
"블루스카이" → {"en": "Bluesky", "ja": "ブルースカイ"}
- Integration point: Inject glossary terms into the LLM system prompt (simplest) vs. pre/post-process text replacement (more deterministic)?
- Size limit: Should we cap the number of glossary entries to avoid bloating the prompt?
Current Translation Flow
The translation prompt is built in internal/translator/prompt.go via buildTranslateSystemPrompt(). The glossary could be appended as an additional instruction section in the system prompt, e.g.:
Glossary (use these exact translations):
- 블루스카이 → Bluesky
- 따라봇 → DDaraBot
Labels
enhancement
Summary
Add glossary (용어집) support so that specific terms are consistently translated according to user-defined mappings. Currently, the translator relies entirely on the LLM's judgment for term translation, which can lead to inconsistent or incorrect translations for domain-specific terms, proper nouns, or preferred expressions.
Motivation
Key Questions to Discuss
config.toml, a separate file (e.g.,glossary.toml/glossary.json), or both?"블루스카이" → "Bluesky"(same for all languages)"블루스카이" → {"en": "Bluesky", "ja": "ブルースカイ"}Current Translation Flow
The translation prompt is built in
internal/translator/prompt.goviabuildTranslateSystemPrompt(). The glossary could be appended as an additional instruction section in the system prompt, e.g.:Labels
enhancement