Skip to content

bet-lab/reference-validator

Repository files navigation

Reference Validator

A powerful tool to validate and enrich BibTeX entries using metadata from Crossref, arXiv, and Google Scholar. It helps ensure your bibliography is accurate, complete, and up-to-date.

Features

  • 🔍 DOI Validation: Automatically verifies DOIs against the Crossref API.
  • 📄 arXiv Integration: Detects arXiv preprints and fetches updated metadata or official publication DOIs.
  • ✨ Smart Enrichment: Fills in missing fields using metadata from multiple reliable academic sources:
    • Crossref - For official DOI metadata.
    • arXiv - For preprint information and updates.
    • Google Scholar - For citation and missing metadata (via scholarly).
    • DBLP - For computer science bibliography.
    • Semantic Scholar - For AI-powered research paper data.
    • PubMed - For biomedical literature.
    • Zenodo - For general repositories and datasets.
    • DataCite - For data DOI registry.
    • OpenAlex - For comprehensive academic metadata.
  • ⚖️ Dual Validation: Compares your local BibTeX data with partial API data to highlight conflicts.
  • 🖥️ Interactive GUI: A modern web-based interface to review, accept, or reject changes visually with color-coded badges and intuitive controls.
  • 📊 Report Generation: Produces detailed validation reports.

Usage Scenarios

This project is managed with uv. Ensure you have uv installed on your system.

Scenario 1: Integration into a LaTeX Writing Environment

If you are writing a paper and simply want to use this tool to validate your references.bib file without modifying the tool's code:

Option A: Install as a Standalone Tool (Recommended)

This installs the tool in an isolated environment, making bibtex-validator available globally or for your specific project without polluting dependencies.

# Install directly from the repository
uv tool install git+https://github.com/bet-lab/reference-validator.git

Run validation:

bibtex-validator references.bib --gui

Option B: Add to Your Project Dependencies

If you are already managing your paper's environment (e.g., for processing scripts) using uv or a pyproject.toml:

# Add to your existing project
uv add git+https://github.com/bet-lab/reference-validator.git

Run validation:

uv run bibtex-validator references.bib

Scenario 2: Development & Contribution

If you want to modify the source code, fix bugs, or add new features:

  1. Clone the repository:

    git clone https://github.com/bet-lab/reference-validator.git
    cd reference-validator
  2. Sync dependencies: Run uv sync to create a virtual environment and install all dependencies (including dev dependencies).

    uv sync
  3. Run from source: You can run the script using uv run.

    uv run bibtex-validator references_test.bib --gui

Detailed Usage

Command Line Interface (CLI)

Basic Validation (Dry Run) Checks the file and prints a report without making changes.

# If installed via Scenario 1 (Option A)
bibtex-validator references.bib

# If installed via Scenario 1 (Option B) or Scenario 2
uv run bibtex-validator references.bib

Auto-Update BibTeX File Validates and applies enriched metadata directly to your file.

uv run bibtex-validator references.bib --update

Save Update to New File Keeps the original file intact and saves the updated version to a new file.

uv run bibtex-validator references.bib --update --output references_enriched.bib

Save Report to File

uv run bibtex-validator references.bib --report validation_report.txt

Interactive GUI 🖥️

Launch the modern web-based review interface to manually inspect and approve changes.

uv run bibtex-validator references.bib --gui

Once running, the web interface will automatically launch in your default browser (default: http://127.0.0.1:8010).

Key Features

📊 Validation Summary Dashboard

  • Attention pie chart showing percentage of entries needing review
  • Real-time statistics: Reviews Conflicts Differences Identical
  • Global Action: ✅ Accept All Entries

🔍 Field-by-Field Comparison

  • Side-by-side comparison of BibTeX vs API values
  • Color-coded status badges for each field:
    • Review - New data available
    • Conflict - Significant mismatch
    • Different - Minor formatting difference
    • Identical - Verified match
  • Source selection dropdown for fields with multiple data sources

⌨️ Keyboard Navigation

  • Arrow keys ( ) to navigate between entries
  • Home/End to jump to first/last entry
  • PageUp/PageDown to jump by 10 entries
  • Esc to clear selection

🎯 Flexible Actions

  • Accept/Reject individual fields
  • Bulk actions: Reject All / Accept All per entry
  • Global batch approval for all entries
  • Real-time save with visual feedback

Options

usage: validate_bibtex.py [-h] [-o OUTPUT] [-r REPORT] [-u] [-d DELAY] [--no-progress] [--gui] [--workers WORKERS] [--port PORT] bib_file

Validate and enrich BibTeX entries using DOI, arXiv, and Google Scholar

positional arguments:
  bib_file              Input BibTeX file

optional arguments:
  -h, --help            show this help message and exit
  -o OUTPUT, --output OUTPUT
                        Output BibTeX file (default: same as input if --update)
  -r REPORT, --report REPORT
                        Output report file (default: print to stdout)
  -u, --update          Update BibTeX file with enriched data
  -d DELAY, --delay DELAY
                        Delay between API requests in seconds (default: 1.0)
  --no-progress         Hide progress indicators
  --gui                 Launch web-based GUI interface 🖥️
  --workers WORKERS     Number of threads for parallel validation (default: 10)
  --port PORT           Port for GUI web server (default: 8010)

Keyboard Shortcuts (GUI Mode)

Key Action Description
Previous Entry Navigate to previous entry
Next Entry Navigate to next entry
Home First Entry Jump to first entry
End Last Entry Jump to last entry
PageUp Jump Back 10 Move backward by 10 entries
PageDown Jump Forward 10 Move forward by 10 entries
Esc Clear Selection Deselect current entry

Dependencies

  • 📚 Core Libraries

    • bibtexparser (>=1.4.0): For reading and writing BibTeX files.
    • requests (>=2.31.0): For making API calls to academic databases.
  • 🔍 Data Sources

    • scholarly (>=1.7.0): For accessing Google Scholar data (optional).
  • 🖥️ GUI Components

    • fastapi (>=0.104.0): Modern web framework for the GUI.
    • uvicorn[standard] (>=0.24.0): ASGI server for running the web interface.

These are automatically installed when using uv run or uv sync.

Data Sources

The validator integrates with multiple academic databases and metadata providers:

Crossref arXiv Semantic Scholar DBLP PubMed Zenodo DataCite OpenAlex

About

validate cited references from bibtex file.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published