Skip to content

"Special letters" are being converted to regular ones #92

@jkreuz

Description

@jkreuz

Hello

Is it possible in some way to define what language the news is in, so it could be fetched correctly?
I used the library for a news in Portuguese, but it converted "special letters" to regular ones.
It highly compromises NLP procedures that deals with syntax, context etc.

example: "àáéóíúâôêãõç" is converted to "aaeiuaoeaoc"

from newsfetch.news import newspaper
news = newspaper('https://g1.globo.com/sc/santa-catarina/noticia/2021/01/20/greve-na-comcap-coleta-feita-por-empresa-privada-em-florianopolis-vai-abranger-35percent-do-roteiro-diz-prefeitura.ghtml')

I saw inside the class it is used Newspaper3K Scraper and if I enforce the right language it returns the correct text.

from newspaper import Article
article = Article(url, language='pt')

thank you

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions