tribixbite · Copilot · Apr 19, 2026 · Apr 19, 2026 · Copilot · Apr 19, 2026
diff --git a/README.md b/README.md
@@ -284,43 +284,81 @@ Available via **Settings → Languages → Download Language Packs**:
 
 ### Creating Custom Language Packs
 
-You can create dictionaries for any language using the included Python scripts:
+You can create dictionaries for any language using the included Python scripts.
+
+#### Quick Start (Recommended)
+
+The easiest way to build a language pack is the two-step `wordfreq` pipeline. This automatically generates a word list with frequency data and packages it into a ready-to-install ZIP:
 
 ```bash
-# Navigate to scripts directory
 cd scripts/
-
-# Install prerequisite
 pip install wordfreq
 
-# Option 1: Two-step build from wordfreq (any language wordfreq supports)
-python get_wordlist.py --lang fr --output fr_words.txt --count 50000
-python build_langpack.py --lang fr --name "French" --input fr_words.txt --use-wordfreq --output langpack-fr.zip
+# Step 1: Generate a word list from wordfreq (supports 50+ languages)
+python get_wordlist.py --lang hu --output hu_words.txt --count 25000
+
+# Step 2: Build the language pack from the word list
+python build_langpack.py --lang hu --name "Hungarian" --input hu_words.txt --use-wordfreq --output langpack-hu.zip
+```
+
+The `--use-wordfreq` flag enriches word frequencies using the [wordfreq](https://github.com/rspeer/wordfreq) library, which produces better prediction results.
+
+#### Input File Format
+
+The `--input` file for `build_langpack.py` and `build_dictionary.py` is a plain text word list. Supported formats:
+
+| Format | Example | Notes |
+|--------|---------|-------|
+| One word per line | `hello` | Frequencies are looked up via `wordfreq` (use `--use-wordfreq`) |
+| Word + TAB + frequency | `hello\t50000` | Uses the provided integer frequency |
+| Word + space + frequency | `hello 50000` | Uses the provided integer frequency |
 
-# Option 2: Build from pre-existing binary dictionary (.bin file)
+Lines starting with `#` are treated as comments and skipped. Words longer than 50 characters are ignored.
-Lines starting with `#` are treated as comments and skipped. Words longer than 50 characters are ignored.
+Lines starting with `#` are treated as comments and skipped. Words longer than 50 characters are ignored during input parsing. In addition, `build_dictionary.py` applies `--min-length`/`--max-length` filtering after loading (default `--max-length` is 30), so words with lengths 31-50 are also excluded unless you override that limit, for example with `--max-length 50`.
-Lines starting with `#` are treated as comments and skipped. Words longer than 50 characters are ignored.
+Lines starting with `#` are treated as comments and skipped. Words longer than 50 characters are ignored during input parsing. In addition, `build_dictionary.py` applies `--min-length`/`--max-length` filtering after loading (default `--max-length` is 30), so words with lengths 31-50 are also excluded unless you override that limit, for example with `--max-length 50`.
+
+> **Tip:** Using `get_wordlist.py` to generate the input file is the recommended approach — it produces a clean one-word-per-line file from `wordfreq`'s curated data. If you provide your own file (e.g., scraped text), make sure it contains **one word per line** (not sentences or paragraphs), otherwise only a small fraction of entries will be recognized as valid dictionary words.
+
+#### Alternative Build Methods
+
+```bash
+# From a pre-existing binary dictionary (.bin file)
 python build_langpack.py --lang sv --name "Swedish" --dict ../src/main/assets/dictionaries/sv_enhanced.bin --output langpack-sv.zip
 
-# Option 3: Build from custom word frequency CSV (format: word,frequency per line)
-python build_dictionary.py --input my_words.csv --output my_lang.bin
+# From a custom word+frequency file (two-step: build dictionary, then package)
+python build_dictionary.py --lang xx --input my_words.txt --output my_lang.bin
 python build_langpack.py --lang xx --name "MyLang" --dict my_lang.bin --output langpack-xx.zip
 
-# Option 4: Batch build all bundled languages (en, es, fr, de, it, pt, nl, id, ms, tl, sw)
+# Batch build all supported languages (en, es, fr, de, it, pt, hu, nl, id, ms, tl, sw)
 python build_all_languages.py
 ```
 
-**Script Details:**
-- `build_langpack.py` — Creates complete .zip language packs from wordfreq
-- `build_dictionary.py` — Builds binary dictionary from CSV word lists
-- `build_all_languages.py` — Batch builds all supported languages
-- `get_wordlist.py` — Extracts top N words from wordfreq for a language
+#### Script Reference
+
+| Script | Purpose |
+|--------|---------|
+| `get_wordlist.py` | Extracts top N words from `wordfreq` for a given language code |
+| `build_langpack.py` | Creates a complete `.zip` language pack (dictionary + unigrams + manifest) |
+| `build_dictionary.py` | Builds a V2 binary dictionary (`.bin`) from a word list |
+| `build_all_languages.py` | Batch builds all supported languages |
+| `generate_unigrams.py` | Generates unigram frequency lists for language detection |
+
+#### Language Pack Contents
+
+Language packs are `.zip` files containing:
+
+| File | Description |
+|------|-------------|
+| `manifest.json` | Metadata — language code, name, version, word count |
+| `dictionary.bin` | V2 binary dictionary with accent normalization and frequency ranks |
+| `unigrams.txt` | Top words ordered by frequency (used for language detection) |
+| `contractions.json` | *(optional)* Apostrophe word mappings for languages that use them |
+| `prefix_boost.bin` | *(optional)* Aho-Corasick trie for prefix boosting (non-English) |
+
+#### Installing a Language Pack
 
-Language packs are simple .zip files containing:
-- `{lang}_enhanced.bin` — Binary dictionary with frequency data
-- `{lang}_enhanced.json` — Human-readable word list with frequencies
-- `manifest.json` — Metadata (language code, version, word count)
+Copy the `.zip` file to your device and import it in **CleverKeys Settings → Multi-Language**.
 
 **Pre-built Language Packs:**
-Available in [`scripts/dictionaries/`](./scripts/dictionaries/) for testing, or download directly from the app.
+Available in [`scripts/dictionaries/`](./scripts/dictionaries/) for immediate use, or download directly from within the app.
 
 <div align="center">
 

diff --git a/scripts/build_all_languages.py b/scripts/build_all_languages.py
@@ -9,6 +9,7 @@
 - Portuguese (pt)
 - Italian (it)
 - German (de)
+- Hungarian (hu)
 - Indonesian (id)
 - Swahili (sw)
 - Malay (ms)
@@ -58,6 +59,7 @@
     'pt': {'name': 'Portuguese', 'words': 25000, 'bundle': True},
     'it': {'name': 'Italian', 'words': 25000, 'bundle': True},
     'de': {'name': 'German', 'words': 25000, 'bundle': True},
+    'hu': {'name': 'Hungarian', 'words': 25000, 'bundle': False},
     'nl': {'name': 'Dutch', 'words': 20000, 'bundle': False},
     'id': {'name': 'Indonesian', 'words': 20000, 'bundle': False},
     'ms': {'name': 'Malay', 'words': 20000, 'bundle': False},

diff --git a/scripts/dictionaries/langpack-hu.zip b/scripts/dictionaries/langpack-hu.zip