Skip to content

Words without accents are misidentified #10

@LinguaCelta

Description

@LinguaCelta

It's fairly common for accents on vowels to be missed off, or placed where they don't belong, especially in informal texts. The tagger is currently strict about this, because the lexicon only includes words with the standard use of accents.

E.g. "swn y mor" is almost certain to mean "the sound of the sea", but the standard spelling would be "sŵn y môr". The tagger therefore understands "swn" to be a form of the verb "to be", and "mor" to be an adverb, meaning "so":

59 swn 6,2 bod B Bdibdyf1u

60 y 6,3 y YFB YFB

61 mor 6,4 mor Adf Adf

62 . 6,5 . Atd Atdt

The optimal tagging would be:

59 swn 6,2 sŵn E Egu

60 y 6,3 y YFB YFB

61 mor 6,4 môr E Egu

62 . 6,5 . Atd Atdt

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions