Skip to content

bib-update: extract DOI/CrossRef/OpenAlex cascade into scripts/fetch_bibtex.py#16

Open
nsmiller2501 wants to merge 2 commits into
scunning1975:mainfrom
nsmiller2501:followup/bib-update-fetch-script
Open

bib-update: extract DOI/CrossRef/OpenAlex cascade into scripts/fetch_bibtex.py#16
nsmiller2501 wants to merge 2 commits into
scunning1975:mainfrom
nsmiller2501:followup/bib-update-fetch-script

Conversation

@nsmiller2501
Copy link
Copy Markdown

Summary

Follow-on to #7. Hoists the deterministic surface of /bib-update's fetch cascade out of SKILL.md prose and into a standalone Python script at .claude/skills/bib-update/scripts/fetch_bibtex.py.

What changed

  • New .claude/skills/bib-update/scripts/fetch_bibtex.py: encodes Sources 0–2 (DOI direct → CrossRef title+author → OpenAlex), the 3-signal match test (title fuzzy ≥85%, year ±1, first-author match), three-way agreement against the parsed filename stem, and citation-key rewriting. Emits a single JSON object with tier, source, bibtex, match_signals, and rejections.
  • SKILL.md now describes how to invoke the script and how to handle each tier of result, replacing ~50 lines of inline curl / matching prose.
  • allowed-tools augmented with Bash(python3:*) and Bash(~/.claude/skills/bib-update/scripts/fetch_bibtex.py:*).

Why

  • The fetch cascade is deterministic: HTTP requests, JSON parsing, fuzzy-match comparisons. Running it through an LLM each invocation wastes tokens and exposes the logic to attention drift.
  • The LLM-fallback path (Tier 3, when all network sources fail) is intentionally not in the script — fetch_bibtex.py returns "fallback-needed" so the model still constructs the unverified entry from the metadata block and blocks for user approval. This preserves the judgment-required step in the LLM layer.
  • The venue-precedence override (preprint vs published) also remains in SKILL.md prose because it requires the model's judgment on filename venue tokens.

Testing

  • Ran fetch_bibtex.py against several DOIs (Tier 1), CrossRef-discoverable papers (Tier 2), OpenAlex-only papers (Tier 2), and intentionally unresolvable working papers (Tier 3) — confirmed JSON output schema and the correct tier/source classification in each case.
  • Spot-checked the 3-signal matcher rejects near-misses (e.g. wrong-year candidates and wrong-author candidates).
  • No change in user-facing /bib-update behavior expected — same cascade, same output, same idempotency.

nsmiller2501 and others added 2 commits May 8, 2026 11:27
Moves the deterministic surface of the DOI → CrossRef → OpenAlex
cascade (HTTP fetches via curl, JSON parsing, 3-signal match test
with title fuzzy-match ≥85% / year ±1 / first-author match,
three-way agreement against the parsed filename stem, and citation
key rewriting) out of SKILL.md prose and into a standalone Python
script. The script emits a single JSON object with tier, source,
bibtex (or null), match_signals, and rejections.

The LLM-fallback path (Tier 3) is intentionally not in the script —
the script returns "fallback-needed" so the model constructs the
unverified entry from the metadata block and blocks for approval.

SKILL.md now describes how to invoke the script and how to handle
each tier of result; the venue-precedence override (preprint vs
published) remains in prose because it requires the model's
judgment on filename venue tokens.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant