usage_change_ITA

Code, data, and trained models for the r/italy COVID-19 Usage Change Corpus. A corpus was created by scraping text from submissions on the Italian subreddit between Jan 30 and Nov 30 of 2019 and 2020. Scraping was done with praw and psaw

The data was lemmatized and preprocessed with Stanza and analyzed with the method from Gonen et al. 2020 to detect short-term usage change in Italian between 2019 and 2020.

Download data

Raw and preprocessed data can be downloaded here

Algorithm output

The output of the usage change detection algorithm by Gonen et al. 2020 is saved in the file "detect_2019_2020_.txt". This is the outcome for the lemmatized corpora.

Visualization

The data were visualized with the Embedding Projector. The files are available in the models directory. To visualize the data, load tensors_[year].tsv and tensors_[year]_meta.tsv in the projector. You can then run one of the dimensionality reduction algorithms provided by the tool, or load tensors_[year]_bookmark.txt to use the already labeled one (t-SNE, 10000 iterations).

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
models		models
source		source
.gitignore		.gitignore
README.md		README.md
detect_2019_2020_.txt		detect_2019_2020_.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

usage_change_ITA

Download data

Algorithm output

Visualization

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

usage_change_ITA

Download data

Algorithm output

Visualization

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages