v1.18.1 — TTS: strip JA full-stops before Kokoro misaki phonemizer#59
Merged
Conversation
Misaki's pyopenjtalk-backed Japanese G2P maps the JA full-stop 「。」 (and its fullwidth ASCII twin 「.」) to an audible phoneme rather than silence, so Kokoro rendered chunk endings as a short "ye"-like artifact. Confirmed on hardware with `あ。` (one artifact) and `あ。。。。。` (five artifacts). Fix: new `tts_worker.kokoro_text.strip_japanese_silent_punctuation` runs in `KokoroEngine._to_ja_phonemes` immediately before misaki sees the chunk text. Chunk boundaries already separate sentences, so dropping these characters removes the artifact without losing meaningful prosody. Other JA punctuation (「、」「!」「?」「・」「…」) is left in place because it either drives prosodic pausing or has not been observed to produce an artifact. If a chunk becomes empty after stripping, the engine now skips it instead of feeding empty text to misaki (which raised "misaki ja g2p returned empty phoneme output"). Tests: 8 new unit tests in `tts-worker/tests/test_kokoro_text.py` cover trailing, repeated, internal, fullwidth, ASCII passthrough, empty input, and preservation of other JA punctuation. Verified end-to-end on hardware after a fresh Kokoro restart. Version bumped across all six sites. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes an audible artifact at the end of Japanese sentences spoken through Kokoro: misaki's pyopenjtalk-backed G2P maps the JA full-stop 「。」 (and its fullwidth ASCII twin 「.」) to an actual phoneme rather than silence, so Kokoro rendered chunk endings as a short "ye"-like sound.
Diagnosis
Confirmed on hardware:
あ。→ user heardあplus an extra soundあ。。。。。→ user heardあfollowed by five repeatedイヤ-like sounds (one per period), demonstrating each。reaches misaki and produces a phonemeFix
tts-worker/src/tts_worker/kokoro_text.pyexposesstrip_japanese_silent_punctuation(regex[。.]+→ empty).KokoroEngine._to_ja_phonemesstrips the input before calling misaki, and returns''if the chunk becomes empty after stripping.KokoroEngine.synthesize_chunksnow skips chunks that come back empty from_to_ja_phonemes(avoidsmisaki ja g2p returned empty phoneme output).tts_worker.shared_text) is untouched, so the existing "preserve JA punctuation" contract there still holds; the strip is engine-local to Kokoro, parallel to howqwen3_text.pyhouses Qwen3-specific text prep.Tests
tts-worker/tests/test_kokoro_text.py(8 tests): trailing 「。」, repeated 「。」 runs, internal+trailing, fullwidth 「.」, preservation of other JA punctuation, punctuation-only input → empty, empty input passthrough, ASCII passthrough.Test plan
あ。now sounds likeあonly, no trailing artifact.synthesize_chunksstill finalizes correctly when every chunk happens to be punctuation-only (rare; the chunk loop simply produces no audio and returns the zero-sample fallback).🤖 Generated with Claude Code