Skip to content

fix: strip lyric pages whose credit uses a colon or band-as-author line#7

Merged
ds17f merged 1 commit into
mainfrom
fix/strip-colon-and-band-credit-lyrics
Jun 4, 2026
Merged

fix: strip lyric pages whose credit uses a colon or band-as-author line#7
ds17f merged 1 commit into
mainfrom
fix/strip-colon-and-band-credit-lyrics

Conversation

@ds17f
Copy link
Copy Markdown
Owner

@ds17f ds17f commented Jun 4, 2026

Problem

The safe-publish pass (scripts/safe_build.py) gated on the words/music by … authorship line to decide a page reproduces licensed lyrics. Five real song pages used a different credit format and slipped through with their lyrics intact:

Page Credit format that was missed
eter, libe, onlytime, way2 colon — Words: Hunter; music: Garcia
ydha band-as-author — By the Grateful Dead (no words/music token)

Fix

All five still carry the publisher's licensed-lyric signature, Copyright Ice Nine Publishing; used by permission, sitting just above the lyric block exactly where the authorship line normally does. This adds it as a fallback credit anchor in CREDIT_RE.

It is the definitive marker of reproduced GD lyrics, so it cleanly excludes the lookalikes:

  • Essays that merely quote permission say "Used with permission" without naming Ice Nine (silber, miller, stephen, tribute) — not matched.
  • Pages whose blockquote is an annotation, not licensed lyrics — operator (an OED entry) and slip (a reader email quoting unofficial lyrics) — have no permission line at all — left untouched.

Verification

  • Per-page diff vs. baseline: exactly eter, libe, onlytime, way2, ydha flip skip→strip; the other 118 stripped pages are byte-for-byte identical (their words by line still matches first).
  • On all 5: verse bodies removed, dead.net/songs notice inserted, title/credit and annotation sections preserved.
  • make dist && make audit → green; make safe && make audit → green (now 119 pages / 121 lyric blocks stripped).

🤖 Generated with Claude Code

The safe-publish pass gated on the "words/music by ..." authorship line, so
five real licensed-lyric pages slipped through with their lyrics intact:

  - eter, libe, onlytime, way2 — credit uses a colon ("Words: Hunter;
    music: Garcia") rather than "by"
  - ydha — credit is "By the Grateful Dead" (band-as-author, no
    words/music token at all)

All five still carry the publisher's licensed-lyric signature, "Copyright
Ice Nine Publishing; used by permission", which sits just above the lyric
block exactly where the authorship line normally does. Add it as a fallback
credit anchor in CREDIT_RE. It is the definitive marker of reproduced GD
lyrics: essays that merely quote permission say "Used with permission"
without naming Ice Nine, and pages whose blockquote is an annotation
(operator's OED entry, slip's reader email) have no permission line, so
they stay untouched.

On the 118 "words by" pages the authorship line still matches first, leaving
their stripped output byte-for-byte unchanged; exactly the five intended
pages flip from skip to strip. make dist/safe + make audit stay green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@ds17f ds17f merged commit 25b6c0f into main Jun 4, 2026
1 check passed
@ds17f ds17f deleted the fix/strip-colon-and-band-credit-lyrics branch June 4, 2026 22:17
ds17f added a commit that referenced this pull request Jun 5, 2026
…ne (#7)

The safe-publish pass gated on the "words/music by ..." authorship line, so
five real licensed-lyric pages slipped through with their lyrics intact:

  - eter, libe, onlytime, way2 — credit uses a colon ("Words: Hunter;
    music: Garcia") rather than "by"
  - ydha — credit is "By the Grateful Dead" (band-as-author, no
    words/music token at all)

All five still carry the publisher's licensed-lyric signature, "Copyright
Ice Nine Publishing; used by permission", which sits just above the lyric
block exactly where the authorship line normally does. Add it as a fallback
credit anchor in CREDIT_RE. It is the definitive marker of reproduced GD
lyrics: essays that merely quote permission say "Used with permission"
without naming Ice Nine, and pages whose blockquote is an annotation
(operator's OED entry, slip's reader email) have no permission line, so
they stay untouched.

On the 118 "words by" pages the authorship line still matches first, leaving
their stripped output byte-for-byte unchanged; exactly the five intended
pages flip from skip to strip. make dist/safe + make audit stay green.

Co-authored-by: Damian Silbergleith <14797221+ds17f@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant