kfst tokenizes text by iterating a (dubiously) sorted list of symbols. An eager strat probably can't account for cases like: string: abac alphabet: [aba, ac, a, b] Some kind of dynamic programming would be needed to do this right.