-
Notifications
You must be signed in to change notification settings - Fork 184
Open
Description
Hi @rafaelvalle
In this line:
Line 102 in d5362cc
| return s in _symbol_to_id and s is not '_' and s is not '~' |
you are filtering "
_" and "~" symbols.Also, one of the main advice to improve alignment map convergence is to add special symbols to start and end of every sentence. Usually these symbols are exactly "
_" and "~" (ex: _What is your name?~). But you filter out exactly these symbols and do not add them anywhere in the code.
It is interesting, that you've included the "_" symbol here:
Line 11 in d5362cc
| _special = '_@©°½—₩€$' |
but anyway filter it out next in sentence preprocessing.
So the questions are:
- What is the reason of filtering out "
_" and "~" symbols? - Why don't you use them as start and end symbols in sentences?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels