Skip to content

LarremoreLab/academic-publication-preferences

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

7 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Consensus and Fragmentation in Academic Publication Preferences

Ian Van Buskirk, Marilena Hohmann, Ekaterina Landgren, Johan Ugander, Aaron Clauset, Daniel B. Larremore

https://arxiv.org/abs/2603.00807

This repository includes deidentified data and code to reproduce the figures and analyses of the manuscript. See installation instructions below.


Overview

Academic publishing requires solving a collective coordination problem: among thousands of possible venues, which deserve a community's attention? This repository contains the data and code to reproduce the figures in our paper, which surveys 3,510 US tenure-track faculty to understand how publication preferences vary across fields, institutions, and individuals.

Key findings:

  • Fields occupy a wide spectrum of consensus β€” Economics and Chemistry show strong agreement on elite venues; Computer Science and Engineering show fragmented preferences across hundreds of outlets.
  • Within fields, faculty at more prestigious institutions and men prefer higher-ranked venues.
  • Researchers publish more successfully in their personal favorite venues than in their field's consensus favorites.
  • Journal Impact Factors explain only ~64% of pairwise preference choices, systematically undervaluing what fields actually prefer.

AI Disclosure Note

This public-facing repository was improved for reproducibility, readability, and clarity, by AI tools. All original analyses and writing from which this repository was derived, including original research notebooks and code, were written by humans.


Repository Structure

.
β”œβ”€β”€ public_data/                   ← Shareable dataset (start here)
β”‚   β”œβ”€β”€ respondents.csv            ← One row per survey respondent
β”‚   β”œβ”€β”€ comparisons.csv            ← One row per pairwise comparison
β”‚   β”œβ”€β”€ venue_selections.csv       ← Venues shown per respondent in initial pass
β”‚   β”œβ”€β”€ venues.csv                 ← Venue metadata (OpenAlex + JIF)
β”‚   β”œβ”€β”€ publication_counts.csv     ← Publication counts by venue per respondent
β”‚   └── field_jif_rankings.csv     ← Field-level consensus + JIF ordinal ranks
β”‚
β”œβ”€β”€ reproducibility/               ← Figure scripts
β”‚   β”œβ”€β”€ run_all.py                 ← Run everything (start here)
β”‚   β”œβ”€β”€ compute_rankings.py        ← Produces derived/ from public_data/
β”‚   β”œβ”€β”€ fig1.py  …  fig5.py        ← Main paper figures
β”‚   β”œβ”€β”€ si_fig1.py  …  si_fig4.py  ← Supplementary figures
β”‚   └── derived/                   ← Intermediate outputs (generated by compute_rankings.py)
β”‚       β”œβ”€β”€ individual_rankings.csv
β”‚       β”œβ”€β”€ field_rankings_per_user.csv
β”‚       β”œβ”€β”€ field_consensus_rankings.csv
β”‚       └── global_rankings_per_user.csv  ← Global LOO SpringRank (all fields)
β”‚
β”œβ”€β”€ networks/                      ← Pre-built network datasets (GraphML + CSV)
β”‚   β”œβ”€β”€ README.md                  ← Schema, attribute tables, and Python examples
β”‚   β”œβ”€β”€ graphml/
β”‚   β”‚   β”œβ”€β”€ comparison/            ← Directed venue comparison multigraphs (all + 13 fields)
β”‚   β”‚   └── bipartite/             ← Respondent Γ— venue bipartite graphs (all + 13 fields)
β”‚   └── csv/
β”‚       β”œβ”€β”€ comparison/            ← Edge list + node attributes (all + 13 fields)
β”‚       └── bipartite/             ← Edge list + node attributes (all + 13 fields)
β”‚
β”œβ”€β”€ utils/                         ← Shared plotting utilities
β”‚   β”œβ”€β”€ field_colors.py            ← Per-field color palette (tab20)
β”‚   └── short_venue_names.py       ← Abbreviated venue name lookup
β”‚
└── README.md                      ← This file

Data Description

All data files are in public_data/. Survey responses are de-identified: names, institution names, email addresses, and timestamps have been removed.

respondents.csv β€” 3,510 rows Γ— 17 columns

One row per survey respondent (US tenure-track faculty, fields with β‰₯ 100 participants).

Column Description
user_db_id Anonymous respondent ID
field Academic field (13 fields)
subfield Subfield within field
career_stage Assistant / Associate / Full Professor
top_init_venue_db_id Top-ranked aspiration venue ID
mid_init_venue_db_id Mid-ranked aspiration venue ID
low_init_venue_db_id Low-ranked aspiration venue ID
venue_bug True if Science was not displayed correctly (exclude from analyses)
gender gm (men) / gf (women) / NaN (not reported)
aa_area Academic area from affiliation database
clusters_20 Field cluster assignment (k=20)
clusters_40 Field cluster assignment (k=40)
institution_bin_3 Institution prestige tercile (1=most prestigious)
institution_bin_10 Institution prestige decile (1=most prestigious)
academia_prestige_bin_3 Academia-wide prestige tercile
academia_prestige_bin_10 Academia-wide prestige decile (used in regressions)

comparisons.csv β€” 153,654 rows Γ— 5 columns

One row per pairwise comparison made during the survey. Rows provided only for fields with β‰₯ 100 respondents.

Column Description
user_db_id Respondent ID
is_tie True if respondent indicated no preference
pref_venue_db_id ID of the preferred venue (if not a tie)
other_venue_db_id ID of the non-preferred venue (if not a tie)
field Field in which the comparison was made

venue_selections.csv β€” 84,246 rows Γ— 4 columns

All venues presented to each respondent during their initial ranking pass. Venues provided only for fields with β‰₯ 100 respondents.

Column Description
user_db_id Respondent ID
would_publish True if respondent rated this venue positively
venue_db_id Venue ID
field Field context

venues.csv β€” 6,363 rows Γ— 13 columns

Metadata for all venues that appeared in the survey, for fields with β‰₯ 100 respondents.

Column Description
venue_db_id Venue ID (links to other tables)
venue_oa_id OpenAlex venue ID
name Full venue name
users_selected Number of respondents who selected this venue
display_name Display name used in survey
2yr_mean_citedness 2-year mean citedness (OpenAlex)
works_count Number of works indexed (OpenAlex)
cited_by_count Total citation count (OpenAlex)
issn_l Linking ISSN
first_topic Primary topic (OpenAlex)
db_concept Concept/field label
jcr_jif Journal Citation Reports Impact Factor (2023)
jcr_name JCR journal name

publication_counts.csv β€” 2,275 rows Γ— 13 columns

Number of publications per respondent in each of their nominated venues, drawn from OpenAlex. Used in Figure 5 (preference realization).

field_jif_rankings.csv β€” 8,794 rows Γ— 13 columns

Field-level consensus SpringRank scores (alpha=20) alongside JIF ordinal ranks and selection fractions. One row per (field, venue) pair. All aggregate statistics β€” no individual-level data.

Key columns: field, venue_db_id, ordinal_porc (consensus rank), ordinal_jif (JIF rank), subset_ordinal_porc, subset_ordinal_jif, frac_selected.

derived/global_rankings_per_user.csv β€” 52,965 rows Γ— 3 columns

Leave-one-out global SpringRank(alpha=20) scores: for each respondent, scores are computed from all comparisons (across all fields) except that respondent's own, restricted to venues they actually compared. Generated by compute_rankings.py. Used in SI Figure 3.


Network Datasets

Pre-built network files are in networks/. Two network types are provided, each in GraphML and CSV formats, for the global dataset and separately for each of the 13 fields.

Network type Nodes Edges Global size
Venue comparison multigraph (directed) Academic venues One directed edge per pairwise comparison; ties produce two edges (A→B and B→A) at weight 0.5 5,686 nodes · 166,136 edges
Respondent Γ— venue bipartite (undirected) Respondents + venues Edge if venue was in respondent's consideration set 9,196 nodes Β· 55,455 edges

See networks/README.md for full attribute schemas and Python read-in examples (NetworkX, graph-tool, pandas).


Requirements

  • Python 3.9 or later
  • uv (recommended) or pip

Python packages

matplotlib
seaborn
scipy
numpy
pandas
statsmodels
springrank
pyprojroot

Setup

1. Clone the repository

git clone https://github.com/<org>/<repo>.git
cd <repo>

2. Create the virtual environment

Using uv (recommended):

curl -LsSf https://astral.sh/uv/install.sh | sh   # install uv if needed
~/.local/bin/uv venv .venv
~/.local/bin/uv pip install matplotlib seaborn scipy numpy pandas statsmodels springrank pyprojroot

Or using standard pip:

python3 -m venv .venv
source .venv/bin/activate
pip install matplotlib seaborn scipy numpy pandas statsmodels springrank pyprojroot

3. Verify the setup

.venv/bin/python -c "import pandas, matplotlib, seaborn, springrank; print('OK')"

Reproducing the Figures

Quick start β€” run everything

.venv/bin/python reproducibility/run_all.py

Expected total runtime: ~15–25 minutes on a modern laptop.

Stage Script Time
Compute rankings compute_rankings.py ~10–15 min
Figures 1–5, SI 1–3 fig1.py … si_fig3.py < 1 min each
SI Figure 4 (permutation test) si_fig4.py ~3 min

If reproducibility/derived/ already exists (e.g. on a second run), skip the ranking step:

.venv/bin/python reproducibility/run_all.py --skip-rankings

Step-by-step

Step 1 β€” Generate intermediate ranking files (run once; takes ~10–15 min):

.venv/bin/python reproducibility/compute_rankings.py

This reads public_data/comparisons.csv and produces four files in reproducibility/derived/:

  • individual_rankings.csv β€” per-user SpringRank(alpha=0) scores
  • field_rankings_per_user.csv β€” leave-one-out field SpringRank(alpha=20) scores
  • field_consensus_rankings.csv β€” field-level consensus SpringRank(alpha=20) scores
  • global_rankings_per_user.csv β€” leave-one-out global SpringRank(alpha=20) scores (all fields pooled)

Step 2 β€” Produce each figure:

.venv/bin/python reproducibility/fig1.py      # Fig 1: field structure + heatmap
.venv/bin/python reproducibility/fig2.py      # Fig 2: preference agreement scatter
.venv/bin/python reproducibility/fig3.py      # Fig 3: network visualizations (HTML)
.venv/bin/python reproducibility/fig4.py      # Fig 4: aspiration/preference regressions
.venv/bin/python reproducibility/fig5.py      # Fig 5: preference realization
.venv/bin/python reproducibility/si_fig1.py   # SI Fig 1: full regression forest plot
.venv/bin/python reproducibility/si_fig2.py   # SI Fig 2: JIF vs consensus rankings
.venv/bin/python reproducibility/si_fig3.py   # SI Fig 3: comparison accuracy
.venv/bin/python reproducibility/si_fig4.py   # SI Fig 4: permutation robustness check

All scripts are run from the project root and write their output PDF to reproducibility/.


Note on Figure 3: fig3.py produces 13 HTML files (one per field) containing D3.js force-directed network graphs. These were imported into Keynote as images to compose the final Figure 3 panel.

Note on SI Figure 4: The permutation test is stochastic. Each run produces a statistically equivalent figure, but it will not be pixel-identical to the published version.


Notes on Rankings

Rankings are computed using SpringRank, a physics-inspired hierarchical ranking algorithm. The key parameters:

  • alpha=0: Individual user rankings (no regularization); used in Figure 2.
  • alpha=20: Field-level consensus rankings (regularization toward a neutral prior); used in Figures 4–5 and SI figures.
  • Leave-one-out (LOO): For fairness analyses, each user's ranking is computed from the field's comparisons excluding their own, preventing circularity.

Tie handling

Indifferent responses (is_tie=True) constitute 8.1% of all comparisons (N=12,482). In the SpringRank adjacency matrix, each tie counts as a half-win for each venue: A[i,j] += 0.5 and A[j,i] += 0.5.


Citation

If you use this data or code, please cite:

@article{vanbuskirk2026consensus,
  title   = {Consensus and fragmentation in academic publication preferences},
  author  = {Van Buskirk, Ian and Hohmann, Marilena and Landgren, Ekaterina and
             Ugander, Johan and Clauset, Aaron and Larremore, Daniel B.},
  journal = {arXiv:2603.00807},
  year    = {2026},
}

License

Data: CC BY 4.0 Code: MIT License


Contact

Daniel B. Larremore β€” daniel.larremore@colorado.edu

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages