Skip to content

EnglishG2P.retokenize: Index out of range crash on currency symbol + number ($/£/€) #17

@Fe2-O3

Description

@Fe2-O3

EnglishG2P.retokenize crashes with "Index out of range" on any currency symbol followed by a number

Summary

EnglishG2P.retokenize(_:) traps with Fatal error: Index out of range (Swift/ContiguousArrayBuffer.swift, EXC_BREAKPOINT/SIGTRAP) when phonemizing any text that contains a $, £, or immediately followed by a number — e.g. "$100", "£19.99", "€5". This is deterministic and reproducible. Because phonemization happens on the live render path, it crashes the host app (we hit it via KokoroSwift in an iOS TTS app — any paragraph mentioning a dollar amount kills the app).

Root cause

In retokenize, the outer loop is for (i, token) in tokens.enumerated() over the parameter array, but inside the loop body a local var tokens: [MToken] shadows the parameter. In the currency branch:

} else if currency != nil {
    if token.tag != .number {
        currency = nil
    } else if j + 1 == tokens.count && (i + 1 == tokens.count || tokens[i + 1].tag != .number) {
        //                                                       ^^^^^^^^^^^^
        // `i` is the OUTER index, but `tokens` here is the inner (shadowed)
        // array → tokens[i + 1] is out of range whenever the outer index
        // exceeds the inner array's count (the common case).
        token._.currency = currency
    }
}

tokens[i + 1] subscripts the inner shadowed array using the outer loop index i. The short-circuit i + 1 == tokens.count almost never fires (it compares an outer index against the inner count), so tokens[i + 1] is evaluated and traps.

Reproduction

let g2p = EnglishG2P(british: false)
_ = g2p.phonemize(text: "It costs $100.")   // crashes

(Trigger requires a currency symbol token next to a number, so Lexicon.currencies$/£/ — sets currency != nil.)

Stack trace (from a device crash, MisakiSwift 1.0.3)

Array._checkSubscript → Array.subscript.getter
MisakiSwift  EnglishG2P.retokenize(_:)
MisakiSwift  EnglishG2P.phonemize(text:performPreprocess:)
KokoroSwift  MisakiG2PProcessor.process(input:)
KokoroSwift  KokoroTTS.phonemizeText → generateAudio

Good news / the ask

This appears already fixed on the default branch — the inner array was renamed tokenssubtokens, un-shadowing the parameter so tokens[i + 1] correctly indexes the outer array. However, there is no tagged release containing the fix (latest tag is 1.0.6; KokoroSwift/kokoro-ios currently resolves MisakiSwift 1.0.3, which still crashes).

Could you cut a release that includes the retokenize un-shadowing fix? That would let downstream consumers (KokoroSwift, and apps built on it) pin a non-crashing version without vendoring.

Possibly related: #4 (closed, "index out of range").

Thanks for MisakiSwift — it's great to have a native Swift G2P.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions