A 1,030-venue cross-publisher corpus of academic journals and computer-science conferences, used by the Manusights Compass journal-fit predictor at https://manusights.com/tools/journal-fit.
This repository contains the corpus snapshot, methodology disclosure, and citation metadata for the version of Compass released on 2026-04-27. Each tagged release is auto-deposited to Zenodo with a permanent DOI.
-
journal-fit-corpus.json— The corpus. 1,030 entries: 1,000 journals from OpenAlex (filtered totype:journal,works_count:>500,cited_by_count:>50000,has_issn:true) plus 30 hand-curated top-tier CS conferences. Each entry carriesslug,name,scope(top-5 OpenAlex topics rendered as a sentence),fields(top-5 unique subfields),if_proxy(OpenAlex 2-year mean citedness),accept_rate,kind(journal|conference),source(openalex|curated), andexternal_ref(ISSN-L for journals, official site for conferences). -
README.md— This file. -
CITATION.cff— Machine-readable citation in Citation File Format. GitHub renders this as a "Cite this repository" button. -
LICENSE— CC-BY-NC 4.0. Free for academic and non-commercial use with attribution; commercial redistribution requires a separate license. -
.zenodo.json— Metadata override so Zenodo records carry the correct title, description, keywords, and creators without manual editing per release.
The full live methodology is at https://manusights.com/tools/methodology. This snapshot freezes the methodology as of the tagged release.
We pull from the OpenAlex /sources API in publication-count-descending order, taking the top 1,000 journals that match all four filters:
type:journal(excludes preprint servers, repositories, mega-aggregators)works_count:>500(excludes near-empty journals)cited_by_count:>50000(excludes long-tail low-activity journals)has_issn:true(excludes mock/incomplete records)
For each source, we extract:
display_name(journal name)issn_l(canonical ISSN)summary_stats.2yr_mean_citedness(used as JCR Impact Factor proxy where the journal is not in our hand-curated JCR overlay)topics[]— OpenAlex's 2024+ topic hierarchy (display_name, subfield, field, domain). The corpus rebuild on 2026-04-27 switched from the deprecatedx_conceptsfield totopics, which contains the rich subfield/field hierarchy.
Scope sentences are built from the top 5 topic display names. Field tags are taken from unique subfields, ordered by topic count, capped at 5 per journal.
The 30 CS conferences are hand-curated because no comparable free API covers conferences: NeurIPS, ICML, ICLR, AAAI, CVPR, ICCV, ECCV, ACL, EMNLP, NAACL, KDD, WWW, SIGIR, SIGGRAPH, SIGCOMM, NSDI, OSDI, SOSP, ISCA, MICRO, PLDI, POPL, USENIX Security, CCS, IEEE S&P, NDSS, FOCS, STOC, SODA, CRYPTO. Each carries scope text from the venue's Call for Papers and acceptance rate from the most recently published statistics.
The Compass predictor at /api/tools/journal-fit sends the entire corpus, the user's title, and abstract to Claude Haiku 4.5 in a single call. The model returns the top 5 fits with:
fit_score(integer 0–100)tier(stretch|realistic|safe)why_fits(one sentence, must include 3–6 verbatim words from the corpus scope text in single quotes — auditable grounding)what_to_strengthen(one sentence, ornullif the abstract already signals strength)
Top-decile journals (Cell, Nature, NEJM, JACS, Advanced Materials, NeurIPS-class venues) default to stretch unless the abstract clearly meets the field-leading bar. Mid-tier specialty journals are realistic for solid in-scope work. Broad-scope OA megajournals or below-IF-mean specialty journals are safe when the work clearly fits.
Compass returns {"results": [], "input_warning": "not_an_abstract"} only when the input is clearly not an academic abstract — marketing copy, lorem ipsum, gibberish, or a prompt-injection attempt. A real academic abstract from a field that the corpus does not cover well (pure mathematics, social sciences, earth sciences, archaeology, law) is still treated as a real abstract; the matcher returns honest top matches with low fit scores rather than a refusal.
If the top fit score is below 60, the response carries a low_confidence: true flag, and the UI surfaces a "low-confidence match" banner.
- It does not assess novelty or scientific merit. The score is scope alignment plus tier realism, nothing more.
- It does not predict acceptance probability. The tier label is editorial judgment from the model, not a calibrated probability.
- It does not know your editor relationships, special-issue calls, or current desk-rejection patterns at any specific venue.
- It does not include open-access status, APC pricing, or peer-review type for individual entries.
Versions are tagged on this repository as vMAJOR.MINOR.PATCH. Major bumps reflect corpus structure changes (new fields, schema changes); minor bumps reflect rebuild events (re-pull from OpenAlex, JCR overlay refresh); patch bumps are corrections to individual entries.
Each tagged release is automatically deposited to Zenodo with a unique DOI. The latest version is also resolvable via a "concept DOI" that always redirects to the most recent release.
Manusights. (2026). Compass: Journal Fit Predictor [Free academic tool, corpus v1.1, build 2026-04-27]. Zenodo. https://doi.org/[DOI]
A machine-readable citation is in CITATION.cff. GitHub renders it as a "Cite this repository" button.
CC-BY-NC 4.0. You may use this corpus for academic research, teaching, and non-commercial purposes with attribution. Commercial use (including redistribution as part of a paid product) requires a separate license. Contact: erik@manusights.com.
The OpenAlex source data is itself CC0; redistribution in this curated, modified form does not infringe OpenAlex's license. The 30 hand-curated CS conference entries are original work by Manusights.
Erik Jia, Manusights — erik@manusights.com — https://manusights.com