Keep certain characters separate; don't merge them even if there is sufficient frequency 1. digits 2. punctuations 3. dates months years 4. ... anything else? watch out: be language agnostic. use Unicode table to figure out digit/punch annotation
Keep certain characters separate; don't merge them even if there is sufficient frequency
watch out: be language agnostic. use Unicode table to figure out digit/punch annotation