forge-content: multi-platform content shortcode replacing github-content#7
Merged
forge-content: multi-platform content shortcode replacing github-content#7
Conversation
Pairs with `forge-meta`. The README-inlining responsibility that was
single-platform (`{{< github-content >}}`) is now multi-platform and
hardened against XSS as well as the prior shortcode-injection vector.
Shared platform resolution
--------------------------
`layouts/partials/forge-resolve.html` (NEW) owns the unified-vs-legacy
forge value parsing, host detection, platform auto-detection
(github.com / gitlab.com / codeberg.org / configured `gitlabHosts`),
and label resolution. Returns a dict via `return`. `forge-meta.html`
is refactored to call it (its sections 1–4 collapse to one partial
invocation); `forge-content.html` calls it the same way. Now the two
templates can never drift on platform detection.
forge-content shortcode
-----------------------
`layouts/shortcodes/forge-content.html` (NEW) accepts:
repository required: "host/owner/repo" (unified) or "owner/repo"
(legacy, GitHub assumed)
branch default "master"
path optional file path; empty fetches README
platform optional override
unsafe bool; allows <svg>/<math> from trusted sources
Per-platform fetch:
- GitHub: dedicated `/readme` endpoint when `path` is empty
- GitLab + Forgejo: probe README.md, README, readme.md in order
via the standard `/contents/{name}` endpoint; first hit wins.
Hugo's daily cache key is reused so the probe costs at most 3
requests per repo per build day.
Three-layer security
--------------------
Untrusted remote markdown passes through three filters before
`markdownify`:
1. Hugo shortcode delimiter neutralisation (carried forward
verbatim from the legacy shortcode): `{{<` / `{{%` / `>}}` / `%}}`
replaced with full-width lookalikes `{{` / `}}`. Prevents
server-side template injection against the consumer site.
2. HTML tag denylist (NEW). `replaceRE` entity-escapes opening and
closing forms of:
script style iframe frame frameset noframes object embed
applet form input button textarea select option optgroup
fieldset legend link meta base noscript
Plus svg / math unless `unsafe="true"` (allowlisted only for
trusted sources). Disallowed tags render as visible escaped
text in the published HTML — they cannot execute or load
remote resources.
3. Dangerous attribute strip (NEW). Event handlers (`on*=...`),
`javascript:` URIs in `href` / `src` / `xlink:href`, and IE-era
`style="...expression(..."` are stripped wherever they appear.
The containing tag survives but loses the attack vector.
This is a denylist, not a strict GFM allowlist. It accepts the
dangerous tags GFM rejects but does not enforce GFM's full
allowlist. Sufficient for typical READMEs from trusted maintainers;
for syndicating attacker-controlled content, run a dedicated
sanitiser as a build step.
Compatibility
-------------
Per user direction, this is a hard break — `github-content.html` is
deleted rather than wrapped. Sites still using `{{< github-content >}}`
will fail to build until they migrate (rename + add host prefix to
`repository`). The exampleSite demo post is migrated.
Other changes
-------------
- `i18n/en.yaml` + `i18n/el.yaml`: `github-error` → `forge-error`
(more generic message, no GitHub-specific text).
- `README.md`: new "Forge content" section under the modifier-docs
cluster, documenting all parameters, the per-platform behaviour
(GitHub auto-readme vs. probe), and the three security passes;
shortcode list updated.
- `exampleSite/content/posts/github-demo/index.md`: title +
description updated; shortcode call migrated to forge-content.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
forge-meta(PR #2) generalised the GitHub metadata block to multi-forge. The matching content-fetching shortcode (github-content.html) hadn't been migrated — it was still GitHub-only, and unlike its renderer-side cousin it had no HTML sanitisation. A README from a public repo could ship<script>/<iframe>/onerror=straight into the consumer site's published HTML because Goldmark'sunsafe = true(configured for inline-HTML support in shortcodes) passes raw HTML through.This PR replaces it with a multi-platform shortcode and adds two new security layers on top of the existing shortcode-delimiter neutralisation.
What
Shared resolver. New
layouts/partials/forge-resolve.htmlowns parsing ofForge/ legacyGithubvalues, host detection, platform auto-detection (github.com/gitlab.com/codeberg.org/params.forgeContent.gitlabHosts), and label resolution. Returns a dict viareturn.forge-meta.htmlis refactored to call it; the new shortcode calls it the same way. The two templates can no longer drift on detection logic.{{< forge-content >}}(new shortcode) accepts:repositoryhost/owner/repo(unified) orowner/repo(legacy GitHub).branchmasterpath/readmeendpoint; GitLab + Forgejo probeREADME.md,README,readme.mdin order.platformgithub/gitlab/forgejo).unsafefalse<svg>/<math>from trusted sources.Three-layer security before
markdownify:{{<,{{%,>}},%}}replaced with full-width lookalikes — prevents server-side template injection.replaceREentity-escapes opening + closing forms of:script style iframe frame frameset noframes object embed applet form input button textarea select option optgroup fieldset legend link meta base noscript, plussvg/mathunlessunsafe="true". Disallowed tags render as visible escaped text — they cannot execute.on*=...),javascript:URIs inhref/src/xlink:href, and IE-erastyle="...expression(..."are stripped wherever they appear; the containing tag survives but loses the attack vector.This is a denylist, not a strict GFM allowlist — sufficient for typical READMEs, conservative on dangerous primitives. Documented in the README's new "Forge content" section.
Hard break (per user direction).
github-content.htmlis deleted rather than wrapped. Sites using{{< github-content >}}must rename to{{< forge-content >}}and prefix the host (e.g.repository="github.com/owner/repo"). Migration is one find-and-replace.Other changes:
i18n/{en,el}.yaml:github-error→forge-error(no GitHub-specific text).README.md: new "Forge content" section; shortcode list updated.exampleSite/content/posts/github-demo/index.md: migrated as the live smoke test.Test plan
cd exampleSite && hugo --gcclean; no DEPRECATION warnings; demo post renders.forge-errorfallback link instead of erroring out.forge-meta.htmlis byte-equivalent to master on pages withForge:front-matter (same code path, just extracted).<script>/<iframe>/<form>etc. in the README appear as<script>etc.onerror=/javascript:URIs are absent from the rendered outputrepository="gitlab.com/group/project") and a Codeberg repo to exercise the probe.{{< github-content >}}(rename + add host prefix).Independent of PR #5 (polish-pass) and PR #6 (print-stylesheet) — branches off
master.Generated by Claude Code