Given that:
- most keyboard layouts have no support for fancy letters or punctuation marks such as
æ, ’, “”, …, etc.
- many corpus texts don’t use these fancy characters either
- the kalamine analyzer can default to ASCII when these characters are not supported by a keyboard layout:
ae instead of æ, ' instead of ’, ... instead of …, "" instead of “”, etc.
our corpus should be “fancified” before getting transformed into JSON dictionary, in order not to penalize keyboard layouts that have a proper support for these special characters. That’s what the fancify.sh script (or make fancy target) does. But this is still a work in progress — several substitutions are still missing, e.g.:
- straight quote pairs into
“”, « », „“ depending on the language
- fine no-break space before
?:;! in French
¿ sign in Spanish
- dashes rather than
--
- etc.
Given that:
æ,’,“”,…, etc.aeinstead ofæ,'instead of’,...instead of…,""instead of“”, etc.our corpus should be “fancified” before getting transformed into JSON dictionary, in order not to penalize keyboard layouts that have a proper support for these special characters. That’s what the
fancify.shscript (ormake fancytarget) does. But this is still a work in progress — several substitutions are still missing, e.g.:“”,« »,„“depending on the language?:;!in French¿sign in Spanish--