Skip to content

Backlinks: add Bing Webmaster inbound links as a second, source-aware source (UI/API/MCP) #703

@arberx

Description

@arberx

Problem

backlinks is Common-Crawl-only today (per-project autoExtractBacklinks + cc_release_syncs, via @ainyc/canonry-integration-commoncrawl). Common Crawl publishes roughly monthly with additional processing lag, which is too low-frequency for timely backlink monitoring or link-acquisition verification.

Google Search Console exposes no links API (the Links report is UI-only). Bing Webmaster Tools is the only programmatic, faster-refreshing first-party link source, and integration-bing currently wires only GetUserSites / GetUrlInfo / GetQueryStats / GetCrawlStats / GetCrawlIssues / SubmitUrl*. There are no link endpoints.

Proposal

Add Bing Webmaster inbound links as a second backlinks source and make the backlinks surface source-aware (commoncrawl | bing-webmaster) end to end (API → CLI → MCP → UI), so an operator can monitor inbound links from whichever source they have set up.

Scope by layer

integration-bing (packages/integration-bing/src/bing-client.ts)

  • Add getUrlLinks() / getLinkCounts() calling Bing GetUrlLinks / GetLinkCounts; add BingInboundLink / BingLinkCount types.

backlinks model (packages/api-routes/src/backlinks.ts, packages/contracts/src/backlinks.ts, packages/db)

  • Add a source discriminator (commoncrawl | bing-webmaster) on backlink rows, plus a migration (per the schema-change rule).
  • Ingest Bing inbound links into the same store; leave Common Crawl ingestion unchanged.

backlinks-sync schedule (packages/contracts/src/schedule.ts + scheduler)

  • Sync from every source the project has set up (or accept a --source filter). Common Crawl stays release-driven; Bing pulls live from the connected account.

API / CLI / MCP

  • source filter + per-source counts on the existing backlinks read endpoints; new response shapes get Zod schemas + jsonResponse (spec-driven typing). MCP parity per the default rule. Maintain UI/CLI parity (any metric shown in the UI is in the CLI/API response).

UI (apps/web/src/pages/BacklinksPage.tsx, apps/web/src/components/project/BacklinksSection.tsx)

  • Source badge / filter on the links table; per-source freshness note (Common Crawl ~monthly vs Bing live).
  • Instructive empty / onboarding state when no source is set up (per the empty-state rule).

Connection-awareness (important)

The two sources have different setup affordances; the UI/API must reflect availability and never error when a source is absent:

  • Common Crawl: public dataset, no auth. "Available" = autoExtractBacklinks enabled + a synced release. Affordance = enable the toggle / run a release sync.
  • Bing Webmaster: requires a connected Bing Webmaster account (bing_connections, per domain). Affordance = connect Bing.

Backlinks reads should report which sources are available / connected and degrade gracefully across CC-only / Bing-only / both / neither (neither → the onboarding empty state). Add a backlinks.source.connected doctor check (warn when a project has neither source set up).

Out of scope

  • GSC URL Inspection referringUrls (already stored per-URL from inspections; a separate per-page signal, not a backlink index).
  • Paid third-party link APIs (Ahrefs / Majestic / Semrush).

Acceptance

  • Bing inbound links appear in backlinks via API, CLI (--format json), MCP, and the UI, tagged source=bing-webmaster, alongside Common Crawl.
  • The backlinks section shows an instructive empty state when neither source is set up, and per-source availability when one is.
  • Tests cover the Bing client methods, the source discriminator + ingestion, and the source-availability logic.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions