Slang InfoViz Final Project

Observable notebook: Slang visualization prototype

This project is ongoing, and both the analysis pipeline and visualization work are still being refined.

This project compares slang usage across two datasets:

a 2020-2025 Gen Z slang dataset
a 2010 Twitter slang dataset

The repository combines data cleaning, annotation, filtering, sentiment analysis, and exploratory summary generation for an information visualization project.

Repository Layout

scripts: Python pipeline scripts for cleaning, annotation, filtering, export, sentiment, and EDA summary generation
original: raw source data snapshots
tweets data prep: working files and process notes for the 2010 Twitter pipeline
usage_context_batches_by_term_category: category-specific usage-context annotation batches
eda_outputs: derived summary tables used for exploration and visualization
md_rtf: project writeups, notes, and exportable presentation text
slang_eda_exploration.ipynb: notebook for EDA and comparison work

Key Data Files

genz_slang_usage_2020_2025.csv: main Gen Z dataset
original/2010_tweets_slang.csv: raw 2010 Twitter source file
2010_tweets_slang_analysis_ready.csv: final 2010 analysis-ready dataset
2010_tweets_slang_with_sentiment.csv: analysis-ready 2010 dataset with sentiment columns added
tweets data prep/2010_terms_annotation_table.csv: term-level annotation table for 2010 slang

2010 Twitter Workflow

The 2010 Twitter dataset was built in stages:

clean the raw source rows
create a term-level annotation table
merge term annotations back onto tweet rows
annotate tweet-level usage context and irony
filter out rows where the word is not actually being used as slang
export an analysis-ready dataset
optionally add sentiment scores

Important note: several intermediate CSVs are still present because the scripts expect those filenames and use them as pipeline checkpoints. Some of them are logically redundant with later outputs, but they are useful if you want to rerun only one section of the workflow instead of rebuilding everything from scratch.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
eda_outputs		eda_outputs
local_only		local_only
md_rtf		md_rtf
original		original
scripts		scripts
tweets data prep		tweets data prep
usage_context_batches_by_term_category		usage_context_batches_by_term_category
.gitignore		.gitignore
2010_tweets_slang_analysis_filtered.csv		2010_tweets_slang_analysis_filtered.csv
2010_tweets_slang_analysis_ready.csv		2010_tweets_slang_analysis_ready.csv
2010_tweets_slang_annotated.csv		2010_tweets_slang_annotated.csv
2010_tweets_slang_with_sentiment.csv		2010_tweets_slang_with_sentiment.csv
README.md		README.md
genz_slang_usage_2020_2025.csv		genz_slang_usage_2020_2025.csv
genz_term_categories.txt		genz_term_categories.txt
genz_term_lookup.csv		genz_term_lookup.csv
genz_usage_contexts.txt		genz_usage_contexts.txt
slang_eda_exploration.ipynb		slang_eda_exploration.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Slang InfoViz Final Project

Repository Layout

Key Data Files

2010 Twitter Workflow

Most Important Scripts

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Slang InfoViz Final Project

Repository Layout

Key Data Files

2010 Twitter Workflow

Most Important Scripts

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages