Skip to content

fix: strip bare-text and table-layout lyrics, not just <blockquote> ones#6

Merged
ds17f merged 1 commit into
mainfrom
fix/strip-bare-text-lyrics
Jun 4, 2026
Merged

fix: strip bare-text and table-layout lyrics, not just <blockquote> ones#6
ds17f merged 1 commit into
mainfrom
fix/strip-bare-text-lyrics

Conversation

@ds17f
Copy link
Copy Markdown
Owner

@ds17f ds17f commented Jun 4, 2026

The bug

The safe pass only removed lyrics wrapped in <blockquote>. 12 song pages lay their lyrics out as bare <br>-separated lines or inside a layout <table> — so they were deployed with full lyrics intact. Caught via soma.html ("So Many Roads") and high.html review.

Affected: soma, bird, buil, cosmic, must, shak, terr, push, grow, vict, lazr, pride.

The fix

Add a second case to strip_page: when there's no lyric <blockquote> in the region, strip the span from the end of the credit/copyright preamble to the first <a name=…> seam. Two gates keep non-lyric pages untouched:

  • skip if the region has list markup (<li> → discographies, title-phrase nav like appl.html / tribute.html)
  • require a verse-dense span (≥10 <br>). Empirically real lyric spans have ≥27 <br>, every non-lyric page ≤5 — a clean gap.

Does NOT gate on "used by permission" — that was the trap: lyrics also appear under "used with permission" (lazr), "used with kind permission" (push), and with no permission phrase at all (vict/pride/grow).

Verification

  • Now strips 112 pages (100 <blockquote> + 12 bare/table), idempotent.
  • All 12 newly-handled pages: 0 lyric block before the seam, notice present.
  • Every non-lyric page untouched: tribute, appl, sage, pass, stra, tons, cagdl, gdhome, index, nonsense, goose.
  • Audit: 0 new broken links (35→29 preserved-source anchors), exits 0.

Single-line epigraphs above the credit line and short lyric fragments quoted inside annotations are left in place — fragments, by the same fair-use reasoning that keeps the essays.

🤖 Generated with Claude Code

The safe pass only removed lyrics wrapped in <blockquote>, so 12 song
pages whose lyrics are laid out as bare <br>-separated lines or inside a
layout <table> (soma, bird, buil, cosmic, must, shak, terr, push, grow,
vict, lazr, pride) were published with their lyrics intact.

Add a second case for those: strip the span from the end of the
credit/copyright preamble to the first <a name=...> annotation seam, with
two gates so non-lyric pages stay untouched — skip when the region has
list markup (discographies, title-phrase nav like appl.html /
tribute.html), and require a verse-dense span (>=10 <br>; real lyric
spans have >=27, everything else <=5, a clean gap).

Crucially this does NOT gate on "used by permission": lyrics also appear
under "used with permission" (lazr), "used with kind permission" (push),
and with no permission phrase at all (vict/pride/grow).

Now strips 112 pages (100 blockquote + 12 bare). Verified: those 12 have
no lyric block before the seam, every non-lyric page is untouched, and
the audit introduces zero new broken links.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@ds17f ds17f merged commit 06c3875 into main Jun 4, 2026
1 check passed
@ds17f ds17f deleted the fix/strip-bare-text-lyrics branch June 4, 2026 19:26
ds17f added a commit that referenced this pull request Jun 5, 2026
…nes (#6)

The safe pass only removed lyrics wrapped in <blockquote>, so 12 song
pages whose lyrics are laid out as bare <br>-separated lines or inside a
layout <table> (soma, bird, buil, cosmic, must, shak, terr, push, grow,
vict, lazr, pride) were published with their lyrics intact.

Add a second case for those: strip the span from the end of the
credit/copyright preamble to the first <a name=...> annotation seam, with
two gates so non-lyric pages stay untouched — skip when the region has
list markup (discographies, title-phrase nav like appl.html /
tribute.html), and require a verse-dense span (>=10 <br>; real lyric
spans have >=27, everything else <=5, a clean gap).

Crucially this does NOT gate on "used by permission": lyrics also appear
under "used with permission" (lazr), "used with kind permission" (push),
and with no permission phrase at all (vict/pride/grow).

Now strips 112 pages (100 blockquote + 12 bare). Verified: those 12 have
no lyric block before the seam, every non-lyric page is untouched, and
the audit introduces zero new broken links.

Co-authored-by: Damian Silbergleith <14797221+ds17f@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant