Observable notebook: Slang visualization prototype
This project is ongoing, and both the analysis pipeline and visualization work are still being refined.
This project compares slang usage across two datasets:
- a 2020-2025 Gen Z slang dataset
- a 2010 Twitter slang dataset
The repository combines data cleaning, annotation, filtering, sentiment analysis, and exploratory summary generation for an information visualization project.
scripts: Python pipeline scripts for cleaning, annotation, filtering, export, sentiment, and EDA summary generationoriginal: raw source data snapshotstweets data prep: working files and process notes for the 2010 Twitter pipelineusage_context_batches_by_term_category: category-specific usage-context annotation batcheseda_outputs: derived summary tables used for exploration and visualizationmd_rtf: project writeups, notes, and exportable presentation textslang_eda_exploration.ipynb: notebook for EDA and comparison work
genz_slang_usage_2020_2025.csv: main Gen Z datasetoriginal/2010_tweets_slang.csv: raw 2010 Twitter source file2010_tweets_slang_analysis_ready.csv: final 2010 analysis-ready dataset2010_tweets_slang_with_sentiment.csv: analysis-ready 2010 dataset with sentiment columns addedtweets data prep/2010_terms_annotation_table.csv: term-level annotation table for 2010 slang
The 2010 Twitter dataset was built in stages:
- clean the raw source rows
- create a term-level annotation table
- merge term annotations back onto tweet rows
- annotate tweet-level usage context and irony
- filter out rows where the word is not actually being used as slang
- export an analysis-ready dataset
- optionally add sentiment scores
Important note: several intermediate CSVs are still present because the scripts expect those filenames and use them as pipeline checkpoints. Some of them are logically redundant with later outputs, but they are useful if you want to rerun only one section of the workflow instead of rebuilding everything from scratch.
scripts/clean_2010_tweets.pyscripts/create_2010_term_table.pyscripts/annotate_2010_tweets_with_terms.pyscripts/create_2010_usage_context_working_file.pyscripts/create_2010_filtering_working_file.pyscripts/export_filtered_2010_analysis_dataset.pyscripts/export_filtered_2010_analysis_ready_dataset.pyscripts/add_2010_tweet_sentiment.pyscripts/generate_eda_summaries.py