This code is used in paper:
Vranić, A., Tomašević, A., Alorić, A. et al. Sustainability of Stack Exchange Q&A communities: the role of trust. EPJ Data Sci. 12, 4 (2023). https://doi.org/10.1140/epjds/s13688-023-00381-x
Communities are categorized as:
- closed or "Area 51"
- active or beta communities
Area 51 community filenames have prefix denoting the date origin of the StackExchange archive file containing the data. For example: 050112astronomy folder is related to Area 51 version of the astronomy community, while astronomy refers to the beta astronomy community.
Beta Stack Exchange communities are available here.
Area 51 Stack Exchange communities can be downloaded from Area51.
data/raw_data/...
From raw xml data we select questions, answers, comments, accepted_answers, users and votes for the first 180 days of each community.
data/interactions/...
For each community we have several .csv files containing all recorded interactions of a given type. These CSV files are obtained by transforming raw XML data using code provided in src/data_preparation.
...interactions_post_questions.csvPosted questions...interactions_questions_answers.csvQuestions and answers...interactions_comments.csvAll posted comments...interactions_comments_questions.csvComments posted directly on a question...interactions_comments_answers.csvComments posted on answers...interactions_acc_answers.csvAccepted answers...interactions_votes.csvVotes cast on questions, answers and comments
Detailed explanation of the columns of these .csv files are given here.
data/reputations/...
Values of dynamic reputation for each user for each of 180 days in given communites are stored as CSV files. eng refers to engagement reputation and pop refers to popularity reputation.
Each row of CSV is unique user in a given community and each column is each day starting from 0 (first day).
-
src/data_preparationholds several scripts needed to transform original XML StackExchange raw data into time-stamped record of interactions of a given type. Data Preparation Pipeline explains the run order and the ouput of the scripts. -
src/dynamical_reputation.pyis the main module for estimating dynamical reputation in StackExchange communities.src/calculate_dynamical_reputation.ipynbshows usage of calculating dynamical reputation. -
src/calculate_core_periphery.ipynbis a notebook for calculating core-periphery structure (we use Bayesian Core-Periphery Stochastic Block Models, whilesrc/core_periphery_functions.pycontains functions for transforming data into appropriate input and saving results into hdf5 format. -
src/data_processing.ipynbis script for calculating the evolution of dynamical reputation, network and core-periphery properties. The results are stored indata/processed data, so they can be directly used for plotting figures -
Figures.ipynbis notebook for plotting results. Scriptsrc/drawing_functions.pyholds different drawing functions.