bib-update: extract DOI/CrossRef/OpenAlex cascade into scripts/fetch_bibtex.py by nsmiller2501 · Pull Request #16 · scunning1975/MixtapeTools

nsmiller2501 · 2026-05-23T21:18:45Z

Summary

Follow-on to #7. Hoists the deterministic surface of /bib-update's fetch cascade out of SKILL.md prose and into a standalone Python script at .claude/skills/bib-update/scripts/fetch_bibtex.py.

What changed

New .claude/skills/bib-update/scripts/fetch_bibtex.py: encodes Sources 0–2 (DOI direct → CrossRef title+author → OpenAlex), the 3-signal match test (title fuzzy ≥85%, year ±1, first-author match), three-way agreement against the parsed filename stem, and citation-key rewriting. Emits a single JSON object with tier, source, bibtex, match_signals, and rejections.
SKILL.md now describes how to invoke the script and how to handle each tier of result, replacing ~50 lines of inline curl / matching prose.
allowed-tools augmented with Bash(python3:*) and Bash(~/.claude/skills/bib-update/scripts/fetch_bibtex.py:*).

Why

The fetch cascade is deterministic: HTTP requests, JSON parsing, fuzzy-match comparisons. Running it through an LLM each invocation wastes tokens and exposes the logic to attention drift.
The LLM-fallback path (Tier 3, when all network sources fail) is intentionally not in the script — fetch_bibtex.py returns "fallback-needed" so the model still constructs the unverified entry from the metadata block and blocks for user approval. This preserves the judgment-required step in the LLM layer.
The venue-precedence override (preprint vs published) also remains in SKILL.md prose because it requires the model's judgment on filename venue tokens.

Testing

Ran fetch_bibtex.py against several DOIs (Tier 1), CrossRef-discoverable papers (Tier 2), OpenAlex-only papers (Tier 2), and intentionally unresolvable working papers (Tier 3) — confirmed JSON output schema and the correct tier/source classification in each case.
Spot-checked the 3-signal matcher rejects near-misses (e.g. wrong-year candidates and wrong-author candidates).
No change in user-facing /bib-update behavior expected — same cascade, same output, same idempotency.

Moves the deterministic surface of the DOI → CrossRef → OpenAlex cascade (HTTP fetches via curl, JSON parsing, 3-signal match test with title fuzzy-match ≥85% / year ±1 / first-author match, three-way agreement against the parsed filename stem, and citation key rewriting) out of SKILL.md prose and into a standalone Python script. The script emits a single JSON object with tier, source, bibtex (or null), match_signals, and rejections. The LLM-fallback path (Tier 3) is intentionally not in the script — the script returns "fallback-needed" so the model constructs the unverified entry from the metadata block and blocks for approval. SKILL.md now describes how to invoke the script and how to handle each tier of result; the venue-precedence override (preprint vs published) remains in prose because it requires the model's judgment on filename venue tokens. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

nsmiller2501 and others added 2 commits May 8, 2026 11:27

Add bib-update references maintenance skill

65fa43f

nsmiller2501 marked this pull request as ready for review May 23, 2026 21:32

nsmiller2501 mentioned this pull request May 23, 2026

wiki-update: split tri-protocol ingest pipeline, integrate read-pdf fanout substrate #20

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bib-update: extract DOI/CrossRef/OpenAlex cascade into scripts/fetch_bibtex.py#16

bib-update: extract DOI/CrossRef/OpenAlex cascade into scripts/fetch_bibtex.py#16
nsmiller2501 wants to merge 2 commits into
scunning1975:mainfrom
nsmiller2501:followup/bib-update-fetch-script

nsmiller2501 commented May 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nsmiller2501 commented May 23, 2026

Summary

What changed

Why

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant