Skip to content

feat: Add glossary support for translation #4

@huketo

Description

@huketo

Summary

Add glossary (용어집) support so that specific terms are consistently translated according to user-defined mappings. Currently, the translator relies entirely on the LLM's judgment for term translation, which can lead to inconsistent or incorrect translations for domain-specific terms, proper nouns, or preferred expressions.

Motivation

  • Certain terms (e.g., brand names, technical jargon, community-specific expressions) should always be translated in a specific way
  • LLMs may translate the same term differently across posts, leading to inconsistency
  • Some terms should be kept untranslated (e.g., proper nouns)

Key Questions to Discuss

  1. Storage format: Should the glossary be defined in config.toml, a separate file (e.g., glossary.toml/glossary.json), or both?
  2. Scope: Should glossary entries be global or per target language?
    • Global: "블루스카이" → "Bluesky" (same for all languages)
    • Per-language: "블루스카이" → {"en": "Bluesky", "ja": "ブルースカイ"}
  3. Integration point: Inject glossary terms into the LLM system prompt (simplest) vs. pre/post-process text replacement (more deterministic)?
  4. Size limit: Should we cap the number of glossary entries to avoid bloating the prompt?

Current Translation Flow

The translation prompt is built in internal/translator/prompt.go via buildTranslateSystemPrompt(). The glossary could be appended as an additional instruction section in the system prompt, e.g.:

Glossary (use these exact translations):
- 블루스카이 → Bluesky
- 따라봇 → DDaraBot

Labels

enhancement

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions