WeNet Hotword

Hotword-biased decoding for the WeNet C++ runtime.

Based On:
Model: wenet/u2pp_conformer-asr-cn-16k-online
Tune: AISHELL-1 hotword test (235 utts, 187 hotwords)

	Baseline	Ours (Ultra)
CER	5.14%	4.82%
Recall	81.08%	95.95%
Precision	95.24%	93.01%
F1	87.59%	94.46%

Test: AISHELL-2 iOS eval (1000 utts, 301 hotwords)

	Baseline	Ours (Ultra)
CER	5.14%	4.83%
Recall	42.03%	88.41%
Precision	100.00%	92.42%
F1	59.18%	90.37%

Test: AISHELL-2 iOS eval (1000 utts, 27 hard hotwords)

Highlights

Phoneme Corrector — fuzzy hotword matching via G2P phoneme edit-distance on the n-best
Confidence-Weighted Match Bonus — per-hotword reward scaled by acoustic confidence
Multi-Objective Autotuner — 2D/3D Pareto over decoder + hotword knobs, with early-exit stagnation detection

Install

# Python environment
uv venv .venv --python 3.12
source .venv/bin/activate
uv pip install torch torchaudio pyyaml dacite optuna soundfile pypinyin jieba modelscope

# C++ runtime (requires cmake >= 3.14)
cd runtime/libtorch
cmake -B build -DGRAPH_TOOLS=ON -DTORCH=ON
cmake --build build -j --target decoder_main
cd ../..

Quick Start

1. Download Model

modelscope download --model wenet/u2pp_conformer-asr-cn-16k-online \
  --local_dir ~/userspace/wenet/models/u2pp_conformer-asr-cn-16k-online

2. Download Datasets

preparation (downloads AISHELL-1 + AISHELL-2, builds hotword lists):

bash tools/prepare_benchmark.sh ~/userspace/wenet

3. Learn Confusion Matrix (per-model, one-time)

python3 tools/learn_confusion.py \
  --model_dir ~/userspace/wenet/models/u2pp_conformer-asr-cn-16k-online \
  --wav_scp ~/userspace/wenet/aishell_test/wav.scp \
  --text ~/userspace/wenet/aishell_test/text \
  --out_csv runtime/libtorch/configs/confusion.csv \
  --device cpu

4. Autotune — Four Modes

Mode	Config	Hotwords	Objective	When to use
Aggressive	`mode_aggressive.yaml`	187 original	recall↑ + CER↓	Hotword-dense domains
Balanced	`mode_balanced.yaml`	187 original	F1↑ + CER↓	General voice assistant, balanced R/P
Conservative	`mode_conservative.yaml`	349 (+distractors)	F1↑ + CER↓	Open-domain dialogue, precision matters
Ultra	`mode_ultra.yaml`	349 (+distractors)	F1↑ + CER↓ + Precision↑	Financial/legal — false positive cost is high

No free lunch: Aggressive maximizes recall at the cost of precision (64% on 301-hotword test). Ultra trades ~3% recall for +29 precision points. Choose based on your domain's tolerance for false positives.

Run one (or all) modes:

# Aggressive
python3 tools/autotune.py \
  --config runtime/libtorch/configs/mode_aggressive.yaml \
  --search-space runtime/libtorch/configs/search_space.yaml

# Balanced
python3 tools/autotune.py \
  --config runtime/libtorch/configs/mode_balanced.yaml \
  --search-space runtime/libtorch/configs/search_space.yaml

# Conservative
python3 tools/autotune.py \
  --config runtime/libtorch/configs/mode_conservative.yaml \
  --search-space runtime/libtorch/configs/search_space.yaml

# Ultra (3-objective Pareto)
python3 tools/autotune.py \
  --config runtime/libtorch/configs/mode_ultra.yaml \
  --search-space runtime/libtorch/configs/search_space.yaml

5. Copy Hotword Lists

Hotword lists are shipped in runtime/libtorch/configs/. Copy them to your test set directory before evaluation:

cp runtime/libtorch/configs/hotwords_all.txt \
   ~/userspace/wenet/aishell2_eval/test1000/
cp runtime/libtorch/configs/hotwords_hard.txt \
   ~/userspace/wenet/aishell2_eval/test1000/

6. Evaluate on Held-Out

# Evaluate on 301-hotword list (mixed easy + hard)
python3 tools/evaluate_modes.py \
  --test-dir ~/userspace/wenet/aishell2_eval/test1000 \
  --hotwords hotwords_all.txt

# Evaluate on 27-hard hotword subset (baseline recall < 90%)
python3 tools/evaluate_modes.py \
  --test-dir ~/userspace/wenet/aishell2_eval/test1000 \
  --hotwords hotwords_hard.txt

Results

Model: wenet/u2pp_conformer-asr-cn-16k-online
Tune: AISHELL-1 hotword test
Test: AISHELL-2 iOS eval subset

301-Hotword Test (mixed easy + hard)

Mode	CER%	Recall%	Precision%	F1%
Baseline (no hotword)	5.14	81.08	95.24	87.59
Aggressive	6.00	92.79	63.78	75.60
Balanced	5.27	93.69	76.47	84.21
Conservative	4.98	93.24	83.81	88.27
Ultra	4.82	95.95	93.01	94.46

27-Hard Hotword Test (baseline recall < 90%)

Mode	CER%	Recall%	Precision%	F1%
Baseline	5.14	42.03	100.00	59.18
Aggressive	5.10	98.55	64.76	78.16
Balanced	4.92	98.55	73.91	84.47
Conservative	4.68	94.20	86.67	90.28
Ultra	4.83	88.41	92.42	90.37

Key Findings

All hotword-enhanced modes improve or maintain CER over no-hotword baseline (5.14% → 4.68–6.00%), showing the pipeline does not harm general ASR.
On 27 hard-case hotwords (foreign names the baseline misses), our method achieves 88% recall vs baseline's 42% — the phoneme corrector closes the gap where character-level matching fails.
Ultra mode is the overall best: highest F1 (94.46% on 301-hot, 90.37% on hard-case) via 3-objective Pareto optimization — no hard-coded precision floor needed.
Conservative mode is the practical sweet spot: lowest CER on hard-case (4.68%) with strong F1 (90.28%), making it suitable for precision-sensitive domains.

Project Structure

runtime/core/decoder/
  corrector.{cc,h}        # PhonemeCorrector + fuzzy match + confusion matrix
  hotword_cache.{cc,h}    # LRU hotword cache
  asr_decoder.{cc,h}      # CalculateMatchBonus + n-best correction wiring
  params.h                # gflags (bonus_weight, confidence_floor, etc.)
runtime/core/bin/
  decoder_main.cc         # decoder binary (+ daemon mode for autotune)
runtime/libtorch/configs/
  mode_{aggressive,balanced,conservative,ultra}.yaml  # four mode configs
  default.yaml            # base config
  search_space.yaml       # Optuna search space
tools/
  autotune.py             # multi-objective Pareto tuner
  compute-hotword-metrics.py
  prepare_hotwords.py     # extract 500-hot / filter hard-case
  evaluate_modes.py       # batch evaluate all 4 tuned configs

Acknowledgements

WeNet — base ASR runtime
cpp-pinyin — runtime G2P
CapsWriter-Offline — inspired the corrector design

License

Apache License 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.github		.github
docs		docs
examples		examples
runtime		runtime
test		test
tools		tools
wenet		wenet
.clang-format		.clang-format
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
CPPLINT.cfg		CPPLINT.cfg
HOTWORD_MODULE_FILES.md		HOTWORD_MODULE_FILES.md
LICENSE		LICENSE
README.md		README.md
README_WENET.md		README_WENET.md
ROADMAP.md		ROADMAP.md
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WeNet Hotword

Highlights

Install

Quick Start

1. Download Model

2. Download Datasets

3. Learn Confusion Matrix (per-model, one-time)

4. Autotune — Four Modes

5. Copy Hotword Lists

6. Evaluate on Held-Out

Results

301-Hotword Test (mixed easy + hard)

27-Hard Hotword Test (baseline recall < 90%)

Key Findings

Project Structure

Acknowledgements

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

WeNet Hotword

Highlights

Install

Quick Start

1. Download Model

2. Download Datasets

3. Learn Confusion Matrix (per-model, one-time)

4. Autotune — Four Modes

5. Copy Hotword Lists

6. Evaluate on Held-Out

Results

301-Hotword Test (mixed easy + hard)

27-Hard Hotword Test (baseline recall < 90%)

Key Findings

Project Structure

Acknowledgements

License

About

Topics

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages