Skip to content

v1.18.1 — TTS: strip JA full-stops before Kokoro misaki phonemizer#59

Merged
amariichi merged 1 commit into
mainfrom
fix/kokoro-ja-period-artifact
May 24, 2026
Merged

v1.18.1 — TTS: strip JA full-stops before Kokoro misaki phonemizer#59
amariichi merged 1 commit into
mainfrom
fix/kokoro-ja-period-artifact

Conversation

@amariichi
Copy link
Copy Markdown
Owner

Summary

Fixes an audible artifact at the end of Japanese sentences spoken through Kokoro: misaki's pyopenjtalk-backed G2P maps the JA full-stop 「。」 (and its fullwidth ASCII twin 「.」) to an actual phoneme rather than silence, so Kokoro rendered chunk endings as a short "ye"-like sound.

Diagnosis

Confirmed on hardware:

  • あ。 → user heard plus an extra sound
  • あ。。。。。 → user heard followed by five repeated イヤ-like sounds (one per period), demonstrating each reaches misaki and produces a phoneme

Fix

  • New module tts-worker/src/tts_worker/kokoro_text.py exposes strip_japanese_silent_punctuation (regex [。.]+ → empty).
  • KokoroEngine._to_ja_phonemes strips the input before calling misaki, and returns '' if the chunk becomes empty after stripping.
  • KokoroEngine.synthesize_chunks now skips chunks that come back empty from _to_ja_phonemes (avoids misaki ja g2p returned empty phoneme output).
  • Other JA punctuation (「、」「!」「?」「・」「…」) is intentionally left in place — those either drive prosodic pausing or have not been observed to produce artifacts.
  • Shared text normalization (tts_worker.shared_text) is untouched, so the existing "preserve JA punctuation" contract there still holds; the strip is engine-local to Kokoro, parallel to how qwen3_text.py houses Qwen3-specific text prep.

Tests

tts-worker/tests/test_kokoro_text.py (8 tests): trailing 「。」, repeated 「。」 runs, internal+trailing, fullwidth 「.」, preservation of other JA punctuation, punctuation-only input → empty, empty input passthrough, ASCII passthrough.

Test plan

  • Ran new + existing tts-worker tests locally — 25/25 pass.
  • Live hardware test on AtomS3R after a fresh Kokoro restart: あ。 now sounds like only, no trailing artifact.
  • Reviewer: confirm synthesize_chunks still finalizes correctly when every chunk happens to be punctuation-only (rare; the chunk loop simply produces no audio and returns the zero-sample fallback).

🤖 Generated with Claude Code

Misaki's pyopenjtalk-backed Japanese G2P maps the JA full-stop 「。」
(and its fullwidth ASCII twin 「.」) to an audible phoneme rather
than silence, so Kokoro rendered chunk endings as a short "ye"-like
artifact. Confirmed on hardware with `あ。` (one artifact) and
`あ。。。。。` (five artifacts).

Fix: new `tts_worker.kokoro_text.strip_japanese_silent_punctuation`
runs in `KokoroEngine._to_ja_phonemes` immediately before misaki
sees the chunk text. Chunk boundaries already separate sentences,
so dropping these characters removes the artifact without losing
meaningful prosody. Other JA punctuation (「、」「!」「?」「・」「…」)
is left in place because it either drives prosodic pausing or has
not been observed to produce an artifact.

If a chunk becomes empty after stripping, the engine now skips it
instead of feeding empty text to misaki (which raised
"misaki ja g2p returned empty phoneme output").

Tests: 8 new unit tests in `tts-worker/tests/test_kokoro_text.py`
cover trailing, repeated, internal, fullwidth, ASCII passthrough,
empty input, and preservation of other JA punctuation. Verified
end-to-end on hardware after a fresh Kokoro restart.

Version bumped across all six sites.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@amariichi amariichi merged commit 0ea6511 into main May 24, 2026
1 check passed
@amariichi amariichi deleted the fix/kokoro-ja-period-artifact branch May 24, 2026 11:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant