Skip to content

Better fancify.sh #3

Description

@fabi1cazenave

Given that:

  • most keyboard layouts have no support for fancy letters or punctuation marks such as æ, , “”, , etc.
  • many corpus texts don’t use these fancy characters either
  • the kalamine analyzer can default to ASCII when these characters are not supported by a keyboard layout: ae instead of æ, ' instead of , ... instead of , "" instead of “”, etc.

our corpus should be “fancified” before getting transformed into JSON dictionary, in order not to penalize keyboard layouts that have a proper support for these special characters. That’s what the fancify.sh script (or make fancy target) does. But this is still a work in progress — several substitutions are still missing, e.g.:

  • straight quote pairs into “”, « », „“ depending on the language
  • fine no-break space before ?:;! in French
  • ¿ sign in Spanish
  • dashes rather than --
  • etc.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions