fix(seo): fix canonical URLs, hreflang, soft-404s, and noindex for private pages by ImJustChew · Pull Request #798 · nthumodifications/courseweb

ImJustChew · 2026-04-27T09:30:15Z

Summary

Addresses multiple issues from Google Search Console Page Indexing report (2026-04-27):

2337 "Alternate page with proper canonical tag" — sitemap listed only /zh/courses/:id as <loc>; en course pages were just hreflang alternates. Now both zh and en are separate <url> entries so Google indexes both language versions independently.
1321 Soft 404s — missing courses returned HTTP 200 with generic shell HTML. Now returns 404 status + X-Robots-Tag: noindex + <meta name="robots" content="noindex"> via handleMissingCourse(). Client-side rendering also adds noindex Helmet.
Hreflang bug — the Cloudflare Worker rewrote canonical for bots but left the static index.html hreflang pointing to the root (nthumods.com/zh, nthumods.com/en) for all pages. Added applyHreflang() helper that updates hreflang links to be page-specific in all bot responses (course, bus, department, and generic pages).
Hardcoded /zh/ canonical in CourseDetailsContainer — client-side Helmet always emitted canonical /zh/courses/:id even on /en/courses/:id. Fixed to use the lang prop. Also added hreflang alternates to the Helmet.
Duplicate without canonical (55 pages) — added handleGenericBotPage() that sets the correct canonical (strips query params) and updates hreflang for all other bot requests to lang-prefixed pages.
Noindex missing from private routes — settings, student/*, next-steps, waitlist, design-system were blocked by robots.txt but had no noindex handle. Added belt-and-suspenders handle: { noindex: true } so TitleUpdater emits the right meta even if crawled via a backlink.
Sitemap additions — bus routes (main/nanda zh+en), calendar, sports-venues, chat, shops, apps, privacy-policy added to both the dynamic worker sitemap and the static fallback.

Files changed

File	Change
`apps/web/worker.ts`	New `handleMissingCourse`, `applyHreflang`, `handleGenericBotPage`; fixed sitemap to include en course URLs; hreflang injection for course/bus/dept/generic pages
`apps/web/src/components/CourseDetails/CourseDetailsContainer.tsx`	Fix `lang` in canonical/OG URL; add hreflang alternates; add noindex Helmet on error state
`apps/web/src/router.tsx`	Add `noindex: true` to settings, student/*, next-steps, waitlist, design-system
`apps/web/public/sitemap.xml`	Add bus routes, calendar, sports-venues, chat, shops, apps (static fallback only — worker generates dynamic sitemap at runtime)

Test plan

Verify /sitemap.xml returns both /zh/courses/:id and /en/courses/:id for each course
Verify bot UA (e.g. curl with User-Agent: Googlebot) to a nonexistent course URL returns HTTP 404 with X-Robots-Tag: noindex
Verify bot UA to /en/courses/:id returns canonical pointing to /en/courses/:id (not /zh/)
Verify bot UA to /zh/timetable returns correct hreflang links (not root URLs)
Verify client-side /en/courses/:id has <link rel="canonical" href="...en/courses/..."> in DOM
Verify settings page has <meta name="robots" content="noindex, nofollow">

🤖 Generated with Claude Code

…ivate pages - worker.ts: generate both zh and en course URLs as separate sitemap entries so Google indexes both language versions instead of treating en as alternates - worker.ts: inject correct hreflang links (zh-TW/en/x-default) for course and bus pages served to bots, fixing the static root hreflang in index.html - worker.ts: return 404 status + noindex meta + X-Robots-Tag for missing courses to eliminate 1321 soft-404 pages (was returning 200 with generic shell) - worker.ts: add handleGenericBotPage() that sets correct canonical and hreflang for all other lang-prefixed bot requests (strips query params from canonical) - worker.ts: add bus route URLs (main/nanda, zh+en) to dynamic sitemap - worker.ts: add calendar, sports-venues, chat, shops, apps to static pages list - CourseDetailsContainer: fix hardcoded /zh/ canonical to use lang prop so en course pages self-canonicalize correctly for client-side rendering - CourseDetailsContainer: add hreflang alternates in course Helmet so bots that render JS see correct language relationships - CourseDetailsContainer: add noindex Helmet to 404/error state - router.tsx: add noindex handle to settings, student/*, next-steps, waitlist, design-system routes (already blocked by robots.txt, belt-and-suspenders) - sitemap.xml: add bus routes, calendar, sports-venues, chat, shops, apps pages Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

vercel · 2026-04-27T09:30:21Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
courseweb	Ready	Preview	Apr 27, 2026 9:30am

sonarqubecloud · 2026-04-27T09:31:05Z

Quality Gate failed

Failed conditions
3.9% Duplication on New Code (required ≤ 3%)

See analysis details on SonarQube Cloud

cloudflare-workers-and-pages · 2026-04-27T09:31:46Z

Deploying with Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status	Name	Latest Commit	Preview URL	Updated (UTC)
✅ Deployment successful! View logs	courseweb-web	`45a6c8b`	Commit Preview URL Branch Preview URL	Apr 27 2026, 09:31 AM

Copilot

Pull request overview

This PR addresses several SEO/indexing issues (canonical URLs, hreflang, soft-404 handling, and noindex for private pages) by updating both the Cloudflare Worker’s bot responses and the client-side Helmet metadata, plus expanding sitemap coverage.

Changes:

Update the Cloudflare Worker to emit correct canonical + hreflang for bot requests, return proper 404 + noindex for missing courses, and include both language variants in the dynamic sitemap.
Fix course detail client-side canonical/OG URL generation to respect the current language and add hreflang alternates.
Add noindex route handles for private/internal pages and expand the static sitemap fallback routes.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.

File	Description
apps/web/worker.ts	Adds bot-specific handlers for canonical/hreflang updates, missing-course 404/noindex behavior, and improves dynamic sitemap generation.
apps/web/src/components/CourseDetails/CourseDetailsContainer.tsx	Fixes language-specific canonical/OG URL and adds hreflang alternates + noindex in error state.
apps/web/src/router.tsx	Marks additional private/internal routes with `handle.noindex` so shared SEO handling can emit noindex.
apps/web/public/sitemap.xml	Updates static fallback sitemap with additional routes and refreshed lastmod dates.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-27T09:34:15Z

          courseCode: `${course.department} ${course.course}-${course.class}`,
          educationalLevel: "University",
          inLanguage: "zh-TW",
-          url: `https://nthumods.com/zh/courses/${course.raw_id}`,
+          url: `https://nthumods.com/${lang}/courses/${course.raw_id}`,


The JSON-LD Course object still hardcodes inLanguage: "zh-TW" even when rendering an /en/... page, while url is now language-specific. This produces inconsistent structured data for English pages. Consider deriving inLanguage (and possibly the primary name) from the lang prop so the schema matches the page language.

Copilot · 2026-04-27T09:34:15Z

+          <meta name="robots" content="noindex, nofollow" />
+          <meta name="googlebot" content="noindex, nofollow" />


In the missing-course/error state, the component sets noindex, nofollow but doesn’t set a canonical URL. Since the rest of the app (e.g. TitleUpdater) will still emit a canonical pointing at the missing /courses/:id URL, crawlers that hit the client-rendered version can see conflicting signals compared to the Worker (which canonicalizes to /courses). Consider explicitly setting a canonical in this error Helmet (and aligning nofollow vs follow with the Worker directive) to keep behavior consistent.

Suggested change

<meta name="robots" content="noindex, nofollow" />

<meta name="googlebot" content="noindex, nofollow" />

<meta name="robots" content="noindex, follow" />

<meta name="googlebot" content="noindex, follow" />

<link

rel="canonical"

href={`https://nthumods.com/${lang}/courses`}

/>

Copilot · 2026-04-27T09:34:16Z

+  const zhPath = pathname.replace(/^\/(zh|en)\//, "/zh/");
+  const enPath = pathname.replace(/^\/(zh|en)\//, "/en/");


handleGenericBotPage() computes zhPath/enPath using a regex that only matches /zh/ or /en/ (with a trailing slash). For paths like /zh or /en (which are valid routes), the replacement won’t happen and both alternates can end up pointing to the same URL. Update the replacement logic to also handle the end-of-string case so /zh ↔ /en works correctly.

Suggested change

const zhPath = pathname.replace(/^\/(zh|en)\//, "/zh/");

const enPath = pathname.replace(/^\/(zh|en)\//, "/en/");

const zhPath = pathname.replace(/^\/(zh|en)(\/|$)/, "/zh$2");

const enPath = pathname.replace(/^\/(zh|en)(\/|$)/, "/en$2");

Copilot · 2026-04-27T09:34:16Z

 async function handleDepartmentPage(url: URL, env: Env): Promise<Response> {
  const dept = url.searchParams.get("department") ?? "";
  const lang = url.pathname.includes("/zh/") ? "zh" : "en";
-  const fallback = () =>
-    env.ASSETS.fetch(new Request(`${url.origin}/index.html`));
+  const fallback = () => handleGenericBotPage(url, env);



handleGenericBotPage() always drops url.search from the canonical URL (canonicalUrl uses only pathname). But handleDepartmentPage() uses this as its fallback, which means a department-filtered page could canonicalize to /courses (missing ?department=...) when the API errors/returns empty. Consider preserving the department query param in the fallback canonical (and alternates) to avoid collapsing distinct pages into the same canonical on failures.

Copilot · 2026-04-27T09:34:16Z

+  let rewriter = new HTMLRewriter().on('link[rel="canonical"]', {
+    element(el) {
+      el.setAttribute("href", canonicalUrl);
+    },
+  });


handleGenericBotPage() rewrites only <link rel="canonical">, but leaves og:url (and other social meta) as whatever is in the static index.html (currently https://nthumods.com). Since social crawlers are included in isBot(), this can produce incorrect OG previews/metadata on most pages. Consider also rewriting meta[property="og:url"] (at least) to match the computed canonical URL.

Suggested change

let rewriter = new HTMLRewriter().on('link[rel="canonical"]', {

element(el) {

el.setAttribute("href", canonicalUrl);

},

});

let rewriter = new HTMLRewriter()

.on('link[rel="canonical"]', {

element(el) {

el.setAttribute("href", canonicalUrl);

},

})

.on('meta[property="og:url"]', {

element(el) {

el.setAttribute("content", canonicalUrl);

},

})

.on('meta[name="twitter:url"]', {

element(el) {

el.setAttribute("content", canonicalUrl);

},

})

.on('meta[property="twitter:url"]', {

element(el) {

el.setAttribute("content", canonicalUrl);

},

});

Copilot · 2026-04-27T09:34:16Z

+    if (!apiRes.ok || apiRes.status === 404) {
+      return handleMissingCourse(lang, env, origin);
    }


handleCourseDetailPage() treats any non-OK response (including 5xx/timeouts) as a missing course and serves a 404 + noindex. If the upstream API has a transient outage, this can cause bots to see widespread 404s and potentially deindex valid pages. Consider distinguishing true 404s from other failures (e.g., return a 503/Retry-After or fall back to a 200 shell with a safer robots directive) so temporary API issues don’t look like permanent removals.

Copilot AI review requested due to automatic review settings April 27, 2026 09:30

Copilot started reviewing on behalf of ImJustChew April 27, 2026 09:30 View session

vercel Bot deployed to Preview April 27, 2026 09:30 View deployment

Copilot AI reviewed Apr 27, 2026

View reviewed changes

ImJustChew merged commit 4f73e0e into main Apr 27, 2026
9 of 10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(seo): fix canonical URLs, hreflang, soft-404s, and noindex for private pages#798

fix(seo): fix canonical URLs, hreflang, soft-404s, and noindex for private pages#798
ImJustChew merged 1 commit intomainfrom
fix/seo-improvements

ImJustChew commented Apr 27, 2026

Uh oh!

vercel Bot commented Apr 27, 2026 •

edited

Loading

Uh oh!

sonarqubecloud Bot commented Apr 27, 2026

Uh oh!

cloudflare-workers-and-pages Bot commented Apr 27, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 27, 2026

Uh oh!

Copilot AI Apr 27, 2026

Uh oh!

Copilot AI Apr 27, 2026

Uh oh!

Copilot AI Apr 27, 2026

Uh oh!

Copilot AI Apr 27, 2026

Uh oh!

Copilot AI Apr 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		<meta name="robots" content="noindex, nofollow" />
		<meta name="googlebot" content="noindex, nofollow" />

		const zhPath = pathname.replace(/^\/(zh\|en)\//, "/zh/");
		const enPath = pathname.replace(/^\/(zh\|en)\//, "/en/");

Uh oh!

Conversation

ImJustChew commented Apr 27, 2026

Summary

Files changed

Test plan

Uh oh!

vercel Bot commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sonarqubecloud Bot commented Apr 27, 2026

Quality Gate failed

Uh oh!

cloudflare-workers-and-pages Bot commented Apr 27, 2026

Deploying with Cloudflare Workers

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vercel Bot commented Apr 27, 2026 •

edited

Loading