Commit 8e4fe08
Bump tokenizers submodule to fix sentencepiece GCC 15 build (#20135)
### Summary
Updates extension/llm/tokenizers to include
meta-pytorch/tokenizers#193, which bumps the sentencepiece submodule to
pick up a missing `#include <cstdint>` (google/sentencepiece#1109).
Without this, `pytorch_tokenizers` fails to compile inside the
`executorch-ubuntu-26.04-gcc15` docker image, blocking the RISC-V
baremetal CI (#19917).
### Test plan
CI
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>1 parent 0d8f437 commit 8e4fe08
2 files changed
Lines changed: 6 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
8 | 8 | | |
9 | 9 | | |
10 | 10 | | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
11 | 15 | | |
12 | 16 | | |
13 | 17 | | |
| |||
36 | 40 | | |
37 | 41 | | |
38 | 42 | | |
39 | | - | |
| 43 | + | |
40 | 44 | | |
41 | 45 | | |
42 | 46 | | |
| |||
Submodule tokenizers updated 28 files
- .ci/scripts/wheel/pre_build_script.sh+30
- .ci/scripts/wheel/test_wheel.py+85
- .ci/scripts/wheel/vc_env_helper.bat+39
- .github/workflows/build-wheels-linux-aarch64.yml+51
- .github/workflows/build-wheels-linux.yml+50
- .github/workflows/build-wheels-macos-arm64.yml+52
- .github/workflows/build-wheels-windows.yml+51
- .github/workflows/claude-code.yml+22
- CMakeLists.txt+1-1
- include/pytorch/tokenizers/pcre2_regex.h-1
- include/pytorch/tokenizers/string_integer_map.h+41-44
- setup.py+1-1
- src/bpe_tokenizer_base.cpp+64-20
- src/hf_tokenizer.cpp+19-13
- src/llama2c_tokenizer.cpp+42-11
- src/normalizer.cpp+2-1
- src/pcre2_regex.cpp+15-15
- src/post_processor.cpp+13-10
- src/regex_lookahead.cpp-1
- src/tekken.cpp-2
- src/tiktoken.cpp+2-6
- test/CMakeLists.txt+1-1
- test/targets.bzl+15
- test/test_concurrent_encode.cpp+152
- test/test_llama2c_tokenizer.cpp+124
- test/test_regex.cpp-1
- test/test_tiktoken.cpp+13
- third-party/sentencepiece+1-1
0 commit comments