Skip to content

docs(seo): rework README for GitHub SEO + add CITATION.cff + repo metadata#4

Merged
Declade merged 2 commits into
mainfrom
feat/seo-readme-metadata-2026-05-24
May 24, 2026
Merged

docs(seo): rework README for GitHub SEO + add CITATION.cff + repo metadata#4
Declade merged 2 commits into
mainfrom
feat/seo-readme-metadata-2026-05-24

Conversation

@Declade
Copy link
Copy Markdown
Owner

@Declade Declade commented May 24, 2026

Summary

Improves SEO + discoverability of Declade/lucairn-research so traffic from lucairn.eu/research, organic search, and the upcoming L4 hardware blog post (lucairn-website sibling workstream, not yet open as a PR) converts well on the GitHub repo page.

What changed

README.md (rework, not rewrite — all locked-positioning copy preserved):

  • New H1 "Lucairn Research Program" + descriptive subtitle for the GitHub social-card preview.
  • New "Published papers" table at the top covering BOTH Paper 1 (Clinical PII redaction benchmark / HIPAA Safe Harbor / MTSamples, CC0) and Paper 2 (Financial PII redaction benchmark / GLBA NPI / CFPB Consumer Complaint Database) with direct links to the canonical lucairn.eu/en/research/<slug> URLs.
  • Reproduce-a-paper section split per industry with the actual scripts each paper uses (verified against package.json).
  • Updated repository-structure tree to reflect the actual Paper-1 + Paper-2 layout (the previous tree was Slice-1-only and missed papers/, glba-category-mapping.ts, inject-finance-pii-core.ts, etc.).
  • Removed stale "Slice 2 mock-only" + "Paper 1 In progress" framing — both papers are shipped.
  • New "Related Lucairn surfaces" footer cross-linking to lucairn.eu/research, lucairn.eu/blog, and lucairn.eu.
  • Added 3 GitHub status badges (License MIT, Papers 2 shipped, Built by Lucairn) using shields.io.

CITATION.cff (new):

  • Surfaces GitHub's "Cite this repository" sidebar for academic citation flows.
  • Lists the repository + the canonical research-index URL.
  • 11 keywords aligned with the repo topics for cross-referencing.

Repo metadata (applied via gh repo edit BEFORE this PR — already live):

  • description: "Open-source PII detection and re-identification risk benchmarks for LLM pipelines under EU AI Act, HIPAA Safe Harbor and GLBA NPI." (130 chars)
  • homepageUrl: https://lucairn.eu/research
  • topics (20, max allowed): pii-detection, pii-redaction, llm-evaluation, llm-benchmark, eu-ai-act, ai-act-annex-iii, gdpr, hipaa, glba, pseudonymisation, re-identification, privacy-attestation, ai-compliance, lucairn, mtsamples, cfpb, regulated-industries, healthcare-ai, finance-ai, benchmark

What did NOT change

  • No changes to papers, datasets, scripts, src/, tests, or CI workflows.
  • No changes to LICENSE.
  • No claim about Paper 3 or unshipped work.
  • No L4-blog cross-link in the README — the L4 hardware blog (blog-draft-2026-05-24-l4-infrastructure-scaling.md) is not yet a PR on lucairn-website. Cross-link to lucairn.eu/blog is generic; a follow-up commit can add a direct link once the L4 post lands.
  • README stays EN-only (per brief — DE translation defer to separate workstream).

SEO target queries baked into structure

  • "Lucairn Research Program" — H1.
  • "PII detection LLM benchmark" — first sentence, repo description, topics.
  • "EU AI Act PII redaction" / "AI Act Annex III compliance" — H2 + topics.
  • "HIPAA Safe Harbor LLM evaluation" — Paper 1 row + first sentence.
  • "GLBA NPI LLM redaction" — Paper 2 row + first sentence.
  • "LLM re-identification risk assessment" — first sentence + topic.
  • "privacy attestation AI pipeline" — block-quote intro + topic.
  • "open source PII benchmark German" — partial match via "Lucairn" (German company) + MIT-licensed open methodology.

Self-review summary (orchestrator: please run the formal reviewer chain)

I ran a desk-check against each reviewer's criteria. No findings:

  • bug-hunter-reviewer: every bash script in the README verified against package.json scripts; every file path verified to exist; repo-tree matches ls output.
  • claim-enforcement-guard: no banned tier names (Pro+, Solo Free/Pro); no banned mechanisms (TLS version, E2E, SOC 2, ISO, MFA, audits, pen testing); brand "Lucairn" single-c throughout; "AI evidence layer" + "evidence formats EU AI Act Articles 10, 12, 14, and 15 reference" matches the locked 2026-05-16 positioning; no unshipped claims.
  • personal-info-leak-detector: Marc's contact email marc@lucairn.eu is the existing public author field already in package.json (intentional surface, mirrored into CITATION.cff); no real customer/patient/employee names; no API keys / secrets.
  • regulator-validator: EU AI Act dates (Art. 113 phases) verified against the existing pre-Locked Trust Center copy; HIPAA Safe Harbor cited as 45 CFR § 164.514(b)(2) (de-identification standard, not a compliance claim); GLBA NPI cited as 16 CFR Part 313 (FTC Privacy of Consumer Financial Information rule); 17 U.S.C. § 105 cited for CFPB public-domain status; Regulation (EU) 2024/1689 designation correct.

Test plan

  • Run full reviewer chain (bug-hunter-reviewer + claim-enforcement-guard + personal-info-leak-detector + regulator-validator) — orchestrator central dispatch.
  • Codex round 1 substantive-PASS [N/N] post-PR-open.
  • After merge: visit https://github.com/Declade/lucairn-research and verify the social-card preview renders with the new description + the "Cite this repository" button appears in the right sidebar.

🤖 Generated with Claude Code

Declade and others added 2 commits May 24, 2026 18:34
Improves discoverability of the Lucairn Research Program repo so traffic
from lucairn.eu/research, organic search, and the upcoming L4 hardware
blog post lands well.

Changes
- README: new H1 "Lucairn Research Program" + descriptive subtitle for
  the GitHub social-card preview; "Published papers" table with both
  Paper 1 (Clinical PII redaction benchmark / HIPAA / MTSamples) and
  Paper 2 (Financial PII redaction benchmark / GLBA / CFPB) and direct
  links to lucairn.eu/research/<slug>; reproduce-a-paper section split
  per industry; updated repository-structure tree to reflect actual
  Paper-1 + Paper-2 layout; cross-links to lucairn.eu/research +
  lucairn.eu/blog + lucairn.eu.
- README: removed stale "Slice 2 mock-only" framing — both papers are
  shipped.
- CITATION.cff: new — surfaces GitHub's "Cite this repository" sidebar
  for academic citation flows; lists the repo + canonical research-index
  URL.

Repo metadata applied via gh repo edit (not in this diff):
- description: "Open-source PII detection and re-identification risk
  benchmarks for LLM pipelines under EU AI Act, HIPAA Safe Harbor and
  GLBA NPI."
- homepageUrl: https://lucairn.eu/research
- topics (20): pii-detection, pii-redaction, llm-evaluation,
  llm-benchmark, eu-ai-act, ai-act-annex-iii, gdpr, hipaa, glba,
  pseudonymisation, re-identification, privacy-attestation,
  ai-compliance, lucairn, mtsamples, cfpb, regulated-industries,
  healthcare-ai, finance-ai, benchmark

No changes to papers, datasets, scripts, src, or tests.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…e commands + Paper 2 wording + locale URLs (Codex r1 -> r2 chain)
@Declade Declade merged commit 98185c7 into main May 24, 2026
0 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant