Skip to content

[runtime] upgrade openfst to v1.8.5 and WeTextProcessing to v1.2.0#2817

Open
pengzhendong wants to merge 1 commit into
mainfrom
upgrade-openfst-wetextprocessing
Open

[runtime] upgrade openfst to v1.8.5 and WeTextProcessing to v1.2.0#2817
pengzhendong wants to merge 1 commit into
mainfrom
upgrade-openfst-wetextprocessing

Conversation

@pengzhendong

@pengzhendong pengzhendong commented Jun 15, 2026

Copy link
Copy Markdown
Member

Summary

  • Upgrade openfst from kkm000/openfst 1.7.2.1 to csukuangfj/openfst v1.8.5
  • Pin WeTextProcessing to v1.2.0 with SOURCE_SUBDIR runtime
  • Bump gflags to v2.3.0, glog to v0.7.1 (CMake compatibility)
  • Remove obsolete openfst patch files
  • Add compatibility layer for kaldi/decoder code

Fixes #2816

Details

Why upgrade

WeTextProcessing v1.2.0 changed StringCompiler/StringPrinter construction to use the default TokenType::BYTE parameter. This requires openfst >= 1.8.5 (csukuangfj fork). wenet previously used kkm000/openfst 1.7.2.1 which has no default for the first constructor argument, causing a compile error.

openfst upgrade (openfst.cmake)

Switch from kkm000/openfst 1.7.2.1 to csukuangfj/openfst v1.8.5-2026-04-11. The new fork has a proper root CMakeLists.txt, so FetchContent_MakeAvailable works directly. Removed the old PATCH_COMMAND since the patched CMakeLists.txt/log.h/flags.h files are no longer needed — the new openfst ships its own CMake build system and uses FST_FLAGS_ prefix for flag variables (avoiding gflags conflict).

WeTextProcessing v1.2.0 (wetextprocessing.cmake)

Added SOURCE_SUBDIR runtime so FetchContent_MakeAvailable finds CMakeLists.txt in the runtime/ subdirectory (the repo root has only Python code). Removed manual add_subdirectory calls — the runtime CMakeLists.txt handles them via add_subdirectory(utils) and add_subdirectory(processor).

gflags/glog version bumps (gflags.cmake, glog.cmake)

gflags v2.2.2 uses cmake_minimum_required(VERSION 2.8.12) which is rejected by CMake >= 3.31. Bumped to v2.3.0 and v0.7.1, matching WeTextProcessing v1.2.0's dependency versions.

Removed old openfst patches (6 files, -619 lines)

The patch files (log.h, flags.h, flags.cc, CMakeLists.txt) were designed for kkm000/openfst. They commented out openfst's LOG/CHECK macros and included glog/gflags to avoid naming conflicts. No longer needed with the new openfst.

Compatibility layer

  • fst_types.h (new): Type aliases int32, int64, uint64 etc. The old openfst provided these in fst/types.h which no longer exists.
  • kaldi-types.h: Include fst_types.h instead of <fst/types.h>.
  • kaldi-error.h: Include glog for google::LogMessage (was previously obtained transitively through the patched openfst). Restore DECLARE_* macros after glog's logging.h #undefs them at end of file. These macros must use FST_FLAGS_ prefix for openfst's own internal flags.
  • utils/flags.h: Include gflags/glog. Restore DECLARE_* macros same as above. Note: DEFINE_* comes from gflags with FLAGS_ prefix, completely separate — wenet's own flag passing is unaffected.
  • utils/CMakeLists.txt: Link gflags_nothreads_static and glog — previously obtained transitively through fst.

kaldi API updates

New openfst renamed several types and uses FST_FLAGS_ prefix for flag variables:

  • MapFstArcMapFst, MapFstOptionsArcMapFstOptions, Map()ArcMap()
  • FLAGS_fst_field_separatorFST_FLAGS_fst_field_separator, FLAGS_fst_weight_separatorFST_FLAGS_fst_weight_separator

Build system

  • kaldi/CMakeLists.txt: Explicit STATIC for kaldi-util and kaldi-decoder. The new openfst sets BUILD_SHARED_LIBS=ON on non-Windows, causing kaldi-decoder to build as .dylib. But kaldi-decoder depends on ContextGraph::GetNextState from the decoder library (which links back to kaldi-decoder), creating a circular dependency on macOS.

Small fixes

  • post_processor.cc: string::nposstd::string::npos (old openfst's using std::string is gone).
  • context_graph.cc: Remove streaming CHECK(...) << "msg" syntax — new openfst's CHECK returns void, not a stream.

@pengzhendong pengzhendong force-pushed the upgrade-openfst-wetextprocessing branch 4 times, most recently from 1562bb6 to 62bcc1b Compare June 15, 2026 05:46
- openfst: switch from kkm000/openfst 1.7.2.1 to csukuangfj/openfst v1.8.5, remove obsolete patch files
- WeTextProcessing: pin to v1.2.0 with SOURCE_SUBDIR runtime
- gflags/glog: bump to v2.3.0/v0.7.1 for CMake >=3.31 compatibility
- Add fst_types.h for missing int32/uint64 type aliases
- kaldi-error.h, utils/flags.h: restore DECLARE_* macros after glog's #undef (FST_FLAGS_ prefix for openfst)
- kaldi API: MapFst->ArcMapFst, Map->ArcMap, FLAGS_->FST_FLAGS_
- kaldi/CMakeLists.txt: explicit STATIC to avoid .dylib circular dependency on macOS
- post_processor.cc: string::npos to std::string::npos
- context_graph.cc: remove streaming CHECK incompatible with new openfst

Fixes #2816
@pengzhendong pengzhendong force-pushed the upgrade-openfst-wetextprocessing branch from 62bcc1b to ef00759 Compare June 15, 2026 05:56
@pengzhendong pengzhendong requested a review from robin1001 June 15, 2026 05:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

wetext build problem

1 participant