I'm trying your code for the analysis of some Dutch sentences. I get unicode errors quite often. I'll see if I can solve it in my code.
The (example) error is below:
Masu
Traceback (most recent call last):
File "C:/data/workspace/common-voice-analysis-master/00_get_wikipedia_words.py", line 62, in
words += getWordsInRandomArticle()
File "C:/data/workspace/common-voice-analysis-master/00_get_wikipedia_words.py", line 49, in getWordsInRandomArticle
return splitIntoWords(getPageText(getPage()))
File "C:/data/workspace/common-voice-analysis-master/00_get_wikipedia_words.py", line 32, in getPageText
cache.write(text)
File "C:\ProgramData\Anaconda3\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u67a1' in position 10: character maps to
I'm trying your code for the analysis of some Dutch sentences. I get unicode errors quite often. I'll see if I can solve it in my code.
The (example) error is below:
Masu
Traceback (most recent call last):
File "C:/data/workspace/common-voice-analysis-master/00_get_wikipedia_words.py", line 62, in
words += getWordsInRandomArticle()
File "C:/data/workspace/common-voice-analysis-master/00_get_wikipedia_words.py", line 49, in getWordsInRandomArticle
return splitIntoWords(getPageText(getPage()))
File "C:/data/workspace/common-voice-analysis-master/00_get_wikipedia_words.py", line 32, in getPageText
cache.write(text)
File "C:\ProgramData\Anaconda3\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u67a1' in position 10: character maps to