Skip to content

JordanChoo/content-trendifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Content Decay Analysis

A two-part tool for finding URLs that are losing organic search visibility over time:

  1. analysis/: a Node + TypeScript script that consumes a Google Search Console CSV export and produces a single JSON file (decay.json) describing the decay pattern for every qualifying URL.
  2. app/: a Vue 3 + Vite static viewer that loads that JSON and renders an interactive table plus a dual-axis chart for browsing results.

The output is fully static; no backend, no database. Build it once and host it anywhere (Netlify, S3, GitHub Pages), or run it locally in dev mode.


Why this exists

A typical mid-to-large content site has thousands of URLs that have lost organic traffic since launch. Some lost it because Google rolled out an algorithm update; some because a competitor outranked them; some because the topic itself drifted out of relevance; many because the content went stale and search intent moved on. The hardest pattern to catch (and the most actionable) is the slow bleed: a URL that has been quietly losing 5–10% of its impressions every month for a year, hidden inside a flat-top GSC chart that averages thousands of pages together.

Search Console shows you the symptoms (impressions are down) but not the diagnosis: which specific URLs are decaying, in what shape, and how steadily. Sorting the pages report by recent traffic surfaces what's big now, not what's getting smaller fastest. Date filters give you a snapshot, not a trajectory.

This tool fits in that gap. It takes the daily per-URL CSV that Google already gives you and answers a single question per page: is this URL trending downward in a way that's steady enough to act on? Steady decline → refresh, redirect, or sunset candidate. Spiky or random → leave it alone.


What you get

  • A ranked table of every URL the script analyzed, with slope, decline %, R², and lifetime totals for both impressions and clicks
  • A dual-axis line chart with the actual monthly series and the fitted regression line for the URL you click
  • A filter that shows only URLs whose slope is negative, so you don't have to mentally subtract growers from decliners
  • Sortable columns so you can shift between "biggest absolute losses" (slope), "biggest relative losses" (decline %), and "steadiest, most predictable losses" (R²) without leaving the page
  • A search box that substring-matches against URL paths so you can audit one section at a time
  • A self-contained static deliverable (no server, no database) that you can host anywhere or open over any HTTP server

Table of contents


How decay is detected

For each URL the analysis script:

  1. Aggregates daily impressions and clicks into monthly buckets.
  2. Drops partial months at the start and end of the dataset, so every data point represents a complete calendar month. (E.g., a CSV starting 2024-12-23 discards December 2024 entirely; the first analyzed month is January 2025.)
  3. Anchors each URL's series at its first full month with traffic. Any "missing" months between that anchor and the most recent full month are treated as zero impressions/clicks (the URL existed in Google's index but earned nothing that month).
  4. Filters out URLs that don't meet the minimum-volume and minimum-history thresholds (defaults: 1,000 lifetime impressions, 100 lifetime clicks, 6 months of data).
  5. Fits a linear regression on monthly impressions and clicks separately. Each regression yields a slope (avg change per month), intercept (predicted starting value), and R² (how well the line fits, with values closer to 1 meaning a steadier trend).
  6. Computes a decline percentage from the modeled start to the modeled end of the URL's window, capped at 100%. A URL can't decline more than 100%, and the cap prevents URLs that fell to zero from reporting math-only values like 144%.

A URL with a steeply negative slope and a high R² is a "steady decliner": the kind of post that has been bleeding traffic month after month rather than briefly spiking and crashing. That's the core signal this tool is designed to surface.

The methodology is intentionally unaware of seasonality and short-term spikes; the high-R² requirement filters them out when you sort for steady declines.

Why linear regression

Several modeling choices were considered:

Model Catches steady decay Catches seasonality Single-number summary Easy to explain
Linear regression yes no (sees as noise) yes (slope) yes
ARIMA yes yes no (multiple parameters) no
Exponential smoothing yes partial medium medium
Period-over-period (3mo vs prior 3mo) partial no yes yes (but misses gradual bleeds)
Mann-Kendall trend test yes (yes/no only) partial no magnitude medium
Bayesian structural time series yes yes no no

Linear regression won because it's the cleanest match for the question "is this URL on a downward trajectory of meaningful magnitude?" The slope answers magnitude in interpretable units (impressions per month). The R² answers steadiness. Both are single numbers that can sort a table. ARIMA produces too many parameters to summarize; period-over-period misses gradual bleeds; Mann-Kendall doesn't quantify how much.

The trade-off accepted: the model doesn't account for seasonality. A URL that legitimately spikes every December will look noisy (low R²) rather than healthy. This is intentional. Surfacing predictable seasonal pages is a different question, and adding seasonal decomposition would be a different tool. See What this tool is not.

Reading R²

R² is the goodness-of-fit of the linear regression: the proportion of variance in the URL's monthly traffic that the line explains. It runs from 0 (line explains nothing) to 1 (line passes through every point).

What it usually means
> 0.7 Strongly linear trend. The slope is a confident estimate. Best refresh candidates if the slope is negative.
0.4 – 0.7 Clear trend with some month-to-month noise. Slope is meaningful.
0.2 – 0.4 Weak trend, dominated by noise or non-linear shape (e.g., spike + decay). Slope is directionally suggestive but uncertain.
< 0.2 No discernible linear trend. Could be flat, spiky, or seasonal. Don't infer decay from these.

Pairing slope with R² is what separates "steady decline" from "one bad month."

The decline % cap

Decline % is computed from the regression line, not from raw start and end values:

declinePct = (predictedStart − max(0, predictedEnd)) / predictedStart × 100

The max(0, predictedEnd) is a cap that prevents pathological values when the line projects below zero. Worked example:

A URL that had 1,865 impressions in January and ~0 every month for the next 14 months produces a regression with intercept ≈ 430 and slope ≈ −47/month. The line projects from 430 down to roughly −228. Without the cap:

declinePct = (430 − (−228)) / 430 × 100 = 153%

That's mathematically what the linear model says, but it's nonsense; a URL can't decline more than 100%. Capping the projected end at zero gives:

declinePct = (430 − 0) / 430 × 100 = 100%

That reading is honest: the model is saying the URL has effectively gone to zero. All URLs that have flatlined to zero hit the 100% ceiling, and R² then tiebreaks to surface the ones whose decline was steady rather than sudden.

Partial-month boundary trim

GSC exports rarely start or end on the first or last day of a calendar month. A dataset starting 2024-12-23 only has 9 days of December, so including it would compare December's tiny partial total against full months and skew the regression. The script detects this at both ends of the dataset and trims partial calendar months before computing anything. The check runs per dataset, so a CSV that happens to start exactly on the 1st of a month doesn't lose a month unnecessarily.

Anchor month rationale

Each URL's regression starts at the URL's first full month with any traffic, not at the dataset's first month. This matters for newer URLs: a page launched in June has 9 months of data in a 15-month dataset; treating the first 6 months as "zero impressions" for that URL would manufacture a fake "growth from zero" signal. By starting each URL's series at its first traffic month, the regression measures the URL's actual lifecycle, not its position relative to the export window.

Once anchored, gaps after the anchor are zero-filled. A URL that had impressions in months 1–4, then nothing for months 5–7, then impressions in months 8–15 is treated as 4 active months → 3 months of true zero → 8 active months. GSC reports a URL's absence the same way as zero impressions, so this gives the regression an honest picture of decline-with-gaps versus continuous-with-traffic.


Prerequisites

  • Node.js 20+ (uses native parseArgs, ESM, top-level await)
  • npm (ships with Node)
  • A Google Search Console CSV export with daily granularity

That's it. No database, no Python, no cloud accounts.


Project layout

content-decay/
├── data/                     # Drop GSC CSV exports here (gitignored)
│   └── gsc-export.csv
├── analysis/
│   ├── analyze.ts            # The analysis script
│   ├── package.json
│   └── tsconfig.json
├── app/
│   ├── public/
│   │   └── decay.json        # Output of analyze.ts (gitignored, regenerated)
│   ├── src/
│   │   ├── components/
│   │   │   ├── DecayTable.vue
│   │   │   └── DecayChart.vue
│   │   ├── App.vue
│   │   ├── format.ts
│   │   ├── style.css
│   │   ├── types.ts
│   │   ├── main.ts
│   │   └── env.d.ts
│   ├── index.html
│   ├── package.json
│   ├── tsconfig.json
│   └── vite.config.ts
├── .gitignore
└── README.md

The two halves (analysis/ and app/) have independent package.json files and don't share code at runtime. The only contract between them is the decay.json file format (see schema).


Architecture and data flow

┌────────────────────┐
│  GSC CSV export    │   daily, per-URL (one row per URL per day)
│  ~376k rows        │
└──────────┬─────────┘
           │  csv-parse (streaming)
           ▼
┌────────────────────┐
│  analyze.ts        │   • Aggregate to monthly buckets per URL
│  (Node + TS)       │   • Detect partial months at boundaries, trim
│                    │   • Anchor each URL at its first traffic month
│                    │   • Filter by volume + history thresholds
│                    │   • Linear regression × 2 (impressions, clicks)
│                    │   • Compute decline % with zero-cap
│                    │   • Sort by impressions decline DESC
└──────────┬─────────┘
           │  writes JSON
           ▼
┌────────────────────┐
│  decay.json        │   • Metadata (date range, filters, counts)
│  ~750KB            │   • Per-URL: slopes, R², monthly series, lifetime
└──────────┬─────────┘
           │  fetched at runtime by the Vue app
           ▼
┌────────────────────┐
│  Browser           │   • Loads JSON
│  (Vue 3 + Vite)    │   • Filters & sorts in-memory
│                    │   • Renders virtualized table
│                    │   • Renders dual-axis Chart.js for selection
└────────────────────┘

The two halves are decoupled: the analysis script doesn't know about Vue, and the app doesn't know about the CSV. The contract between them is the decay.json shape, formalized as a TypeScript interface in app/src/types.ts.

Performance characteristics

  • Aggregation is O(rows). A 376k-row CSV (~37 MB) aggregates and writes JSON in well under 5 seconds on a modern laptop.
  • Regression is O(URLs × months). With 6,000 URLs over 15 months, that's 90,000 monthly points across two regressions each → ~180,000 simple-statistics operations. Sub-second.
  • App load fetches a single JSON (typically 0.5–2 MB depending on URL count) and parses it once on mount. Sort and filter run on that in-memory array; both are O(URLs) per interaction.
  • Table rendering uses virtual scrolling (vue-virtual-scroller), so DOM size is constant regardless of URL count. Browsers stay smooth past 10,000 rows.

Quick start (5 minutes)

Assumes you already have your GSC CSV.

# 1. Install analysis dependencies
cd analysis
npm install

# 2. Drop your CSV in /data, then run the analysis
#    (use the default file name, or pass --input)
cp ~/Downloads/your-export.csv ../data/
npm run analyze -- --input ../data/your-export.csv

# 3. Install app dependencies and start the dev server
cd ../app
npm install
npm run dev
# → open http://localhost:5173

You're now browsing decay results. Re-run step 2 anytime and refresh the browser; no rebuild needed in dev mode.


Workflow

1. Drop in a CSV

Place your CSV anywhere. The data/ folder is the conventional location and is gitignored so you won't accidentally commit client data. Multiple CSVs can coexist; each analysis run targets one file.

2. Run the analysis

From analysis/:

npm run analyze

That uses the defaults (input ../data/gsc-export.csv, output ../app/public/decay.json, standard thresholds).

To target a different file or tune thresholds:

npm run analyze -- \
  --input ../data/your-export.csv \
  --output ../app/public/decay.json \
  --min-impressions 1000 \
  --min-clicks 100 \
  --min-months 6

The script logs its progress and ends with a summary like:

[analyze] parsed 376,221 rows across 6,429 URLs
[analyze] date range: 2024-12-23 → 2026-04-20
[analyze] full months: 2025-01 → 2026-03 (15 months)
[analyze] wrote 740 URLs (0.73 MB) to ../app/public/decay.json
[analyze] qualified: impressions=740 clicks=235

If only a small number of URLs qualify, that's expected; most pages on a typical site don't meet the volume/history thresholds.

3. View the results

Local development (live reload):

cd app
npm run dev
# → http://localhost:5173

The app fetches decay.json at runtime, so re-running the analysis and refreshing the browser is enough. No rebuild required.

Production build (static, hostable):

cd app
npm run build
# → app/dist/  (deployable static folder)

npm run preview      # optional: serve the production build locally
# → http://localhost:4173

The dist/ folder is fully static. See Deploying.


CSV format

The script expects a Google Search Console CSV export with these columns (header row required, exact names):

Date,URL,Impressions,Clicks,CTR,Avg Rank

Example:

Date,URL,Impressions,Clicks,CTR,Avg Rank
2025-01-01,https://example.com/blog/post-a,1234,56,4.54%,12.3
2025-01-01,https://example.com/blog/post-b,789,5,0.63%,33.2
2025-01-02,https://example.com/blog/post-a,1187,52,4.38%,12.5

Notes:

  • Date must be YYYY-MM-DD.
  • URL is matched literally; /page/ and /page are different URLs.
  • CTR and Avg Rank are read but not used by the analysis (they are not currently surfaced anywhere).
  • One row per (Date, URL) pair. The script aggregates rows by URL within each calendar month.
  • Rows missing any of (Date, URL, Impressions, Clicks) or with non-numeric impressions/clicks are silently skipped.

Where to get this from GSC:

  1. Search Console → Performance → set the date range (this tool benefits from at least 9–12 months of data)
  2. Export → CSV
  3. Use the per-day table; the URL/Page export is what's expected

Analysis tuning (CLI flags)

Flag Default Purpose
--input ../data/gsc-export.csv Path to source CSV
--output ../app/public/decay.json Where to write results
--min-impressions 1000 Min lifetime impressions for impressions-side analysis
--min-clicks 100 Min lifetime clicks for clicks-side analysis
--min-months 6 Min full months of data per URL

When to adjust

  • Smaller site / fewer URLs: lower --min-impressions to 500 or 250 to keep more URLs in the analysis. The trade-off is more noise in the rankings.
  • Larger site, focusing only on top performers: raise --min-impressions to 5000 or 10000 to filter to high-traffic URLs only.
  • Short history (< 9 months of data): lower --min-months to 3 or 4, but expect noisier R² values; fewer data points means individual months sway the regression more.
  • Surface URLs with traffic but no clicks: lower --min-clicks (the click-side analysis is independent of the impression-side analysis, so a URL can qualify for one and not the other).

The analysis is fully deterministic: same CSV plus same flags equals identical decay.json.


Reading the app

The table

Column What it shows
URL The page path (domain stripped for readability; full URL in tooltip and chart link)
Imp Slope Avg impressions change per month (negative = losing impressions)
Imp Decline % drop from modeled start to end, capped at 100%
Imp R² How well a straight line fits the impressions series. 1.0 = perfect fit. < 0.3 = noisy/spiky.
Clk Slope Same as Imp Slope, for clicks
Clk Decline Same as Imp Decline, for clicks
Clk R² Same as Imp R², for clicks
Lifetime imp Sum of impressions across the URL's analyzed window
Lifetime clk Sum of clicks across the URL's analyzed window
Months Number of monthly data points used in the regression

Color cues:

  • 🔴 Red = a declining value (negative slope or positive decline %)
  • 🟢 Green = a growing value
  • Gray "—" = the URL didn't qualify for that metric (e.g., not enough lifetime clicks for the click-side analysis)

Sortable columns: click any column header. Click again to reverse direction. Active sort column is highlighted gold.

Search: types substring-match against the full URL.

Declining only: when on, hides URLs whose slope on the active metric is positive or zero. On by default.

Impressions / Clicks toggle: switches the default sort and the basis of the declining-only filter. Both metrics' columns remain visible.

Selecting a row: click any row to load it in the chart above.

The chart

A dual-axis line chart of the selected URL's monthly series:

  • Gold solid line: actual monthly impressions (left axis)
  • Gold dashed line: fitted regression line for impressions
  • Charcoal solid line: actual monthly clicks (right axis)
  • Charcoal dashed line: fitted regression line for clicks

Hovering shows tooltips with exact values for every line at that month.

The three stat cards above the chart summarize the regression for the selected URL:

  • Imp Decline / Click Decline: capped decline %, slope, and R²
  • Lifetime: total impressions, total clicks, months of data

The URL above the chart is a clickable link that opens the page in a new tab.

Recommended workflows

"Find the worst overall decliners"

  1. Default state: declining only ON, sort by Imp Decline DESC.
  2. Top of table = URLs that have lost the most impressions.

"Find slow, steady bleeders (best for content refreshes)"

  1. Declining only ON.
  2. Click "Imp R²" once to sort DESC (a fresh column starts in DESC).
  3. Top of table = URLs whose decline is most linear → most predictable, best refresh candidates.

"Find URLs losing clicks even when impressions are stable"

  1. Switch to the Clicks metric.
  2. Sort by Clk Decline DESC.
  3. Compare Clk Slope vs Imp Slope per row. A negative click slope alongside a positive impression slope points to a ranking-position issue rather than a visibility issue.

"Audit a specific section"

  1. Type the path prefix into search (e.g., /blog/).
  2. Sort however you want.

Worked example: a single URL through the system

Take https://example.com/blog/content-strategy-2024/, a URL that decayed cleanly from 8,000 impressions/month down to 1,000 over 15 months.

  1. CSV. The file contains roughly 15 full months × ~30 days/month = 450 daily rows for this URL.

  2. Aggregation. Those 450 rows collapse into 15 monthly bucket totals: [8000, 7500, 7000, 6500, 6000, 5500, 5000, 4500, 4000, 3500, 3000, 2500, 2000, 1500, 1000].

  3. Anchor. First month with any traffic is month 1 of the dataset window. No clipping needed.

  4. Filter check. Lifetime impressions = 67,500 (≥ 1,000 ✓), months = 15 (≥ 6 ✓). Qualifies for impressions analysis. Lifetime clicks would be checked separately against the click threshold.

  5. Regression. Fit a line to (month index, impressions) for indices 0–14. The math returns:

    • slope = −500/mo (losing 500 impressions every month)
    • intercept = 8,000 (predicted month-1 value, matching the actual)
    • R² = 1.00 (the data is perfectly linear by construction; in real GSC data R² would typically be 0.7–0.95 for a steady decay)
  6. Decline %. Predicted end at month 14 = 8000 + (−500 × 14) = 1000. The line stays positive, so no cap kicks in: (8000 − 1000) / 8000 × 100 = 87.5%.

  7. JSON. This URL ends up in decay.json with slope: −500, declinePct: 87.5, r2: 1.0.

  8. App. The URL renders in the table with red −500/mo, red 87.5%, and 1.00 for R². Clicking the row loads the chart, which renders the actual 15-point series (gold solid line) with a downward dashed regression line that tracks the points exactly.

  9. Triage. High R² + high decline % + meaningful lifetime impressions = strong candidate for content refresh or sunset. The dashed line tracking the data closely tells you this is a steady decline, not a one-time hit.

Contrast this with a URL whose series is [5677, 7892, 3942, 1780, 850, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]. The direction is the same, but the regression line projects below zero (so the cap fires and decline reads 100%), and R² lands around 0.5–0.6 because the shape is "front-loaded then dead" rather than a clean linear descent. Both URLs surface in the table; the R² column tells you which one is the steady bleeder versus which one fell off a cliff.


decay.json schema

{
  metadata: {
    runAt: string;            // ISO timestamp of the analysis run
    inputFile: string;        // basename of the source CSV (e.g. "gsc-export.csv")
    dateRange: { start: string; end: string };  // YYYY-MM-DD
    fullMonths: string[];     // e.g. ["2025-01", "2025-02", ...]
    filters: {
      minImpressions: number;
      minClicks: number;
      minMonths: number;
    };
    counts: {
      totalUrlsInCsv: number;
      qualifiedForImpressions: number;
      qualifiedForClicks: number;
      includedInOutput: number;
    };
  };
  urls: Array<{
    url: string;
    monthsOfData: number;
    lifetimeImpressions: number;
    lifetimeClicks: number;
    monthlySeries: Array<{
      month: string;          // YYYY-MM
      impressions: number;
      clicks: number;
    }>;
    impressions: RegressionResult | null;  // null = didn't qualify
    clicks: RegressionResult | null;
  }>;
}

interface RegressionResult {
  slope: number;              // change per month
  intercept: number;          // predicted value at first analyzed month
  r2: number;                 // 0..1
  startValue: number;         // = intercept
  endValue: number;           // intercept + slope * (n - 1), uncapped
  declinePct: number | null;  // capped at 100, null if intercept ≤ 0
}

URLs are sorted in the JSON by impressions decline % descending (worst decay first). The app re-sorts at runtime based on user selection.


Deploying

The app/dist/ folder is fully static. Two pieces ship: the bundled HTML/JS/CSS and decay.json (which is read at runtime).

Target How
Netlify drag-and-drop dist/ into a new site; or netlify deploy --dir=dist --prod
GitHub Pages push dist/ to a gh-pages branch; or use a workflow
S3 / CloudFront aws s3 sync dist/ s3://your-bucket/
Local file server cd dist && python3 -m http.server 8080

Note: opening dist/index.html directly via file:// will not work; modern browsers block ES module loading and fetch() from the local file protocol. Any static HTTP server suffices, including one-line tools like python3 -m http.server or npx serve.


Troubleshooting

Error: No rows parsed from CSV The CSV has no valid rows. Check:

  • The header row is exactly Date,URL,Impressions,Clicks,CTR,Avg Rank (case-sensitive).
  • The file uses comma delimiters, not tabs or semicolons.
  • The file isn't empty or only headers.

Error: Not enough full months in dataset to analyze. The CSV's date range falls within a single calendar month after the partial-month trim. Re-export a longer date range (the tool needs at least one complete calendar month).

The app loads but the table is empty. Either (a) decay.json is empty (check the analysis log; did 0 URLs qualify?) or (b) the "Declining only" toggle is hiding everything. Toggle it off to verify.

Lots of URLs but they all have low R² and small slopes. Your dataset may be too short (e.g., 6 months). The regression has fewer data points to work with, so noise dominates. Either wait for more months of data or raise --min-impressions to filter to high-volume URLs where the signal is cleaner.

decay.json: 404 in the dev server. The analysis hasn't been run yet, or the output path doesn't match app/public/decay.json. Re-run npm run analyze from analysis/.

The dev server compiles but the chart is blank. A URL has been selected but its monthlySeries is empty or its regression fields are null. Click a different row.

Build fails with a TypeScript error. The vue-tsc step runs in npm run build. Common cause is a stale type cache; try rm -rf app/node_modules/.vite and rebuild.


Limitations and caveats

  • Linear-only model. URLs with non-linear decay shapes (S-curves, step changes from algorithm updates, J-curves from launches) are summarized by a straight line. R² will reflect the poor fit, but the slope is still computed. Don't trust slopes from URLs with R² < 0.3.
  • No seasonality awareness. A page that legitimately spikes every Q4 will register low R² and may be excluded when sorting for "steady decline" even if the page is healthy. This is by design but should be expected.
  • No SERP feature attribution. If a URL lost impressions because Google added an AI Overview, a featured snippet, or a knowledge panel above it, this tool sees the impression drop but cannot tell you why. Pair findings with manual SERP inspection.
  • Window-bound. A URL that decayed before your CSV's date range looks flat to this tool. The analysis can only see what's in the export, so pull a longer date range from GSC for older sites.
  • Aggregates across countries / devices / search types. The CSV format the script consumes does not segment by country or device, so the regression is on the URL's total traffic. For multilingual sites that's usually fine; for sites where one country drove everything, the slope conflates "country X's impressions" with "the URL itself."
  • No causation. The tool reports correlation with time, not causes of decay. Use it as a triage layer, not a verdict.
  • Click-side analysis is sparser. With the default 100-click filter, many URLs qualify for impressions but not clicks. That's expected for low-CTR pages and isn't a bug.
  • GSC reporting quirks. Daily GSC data is subject to privacy filtering (queries below a threshold aren't reported), late-arriving data (counts may shift days after the fact), and query batching, all of which add noise the regression cannot distinguish from real signal. Treat slopes as directionally meaningful, not statistically precise.

What this tool is not

  • Not a content audit tool. It surfaces decay candidates; it doesn't recommend specific edits, link rebuilds, or topic clusters.
  • Not a forecasting tool. The slope tells you what was happening, not what will happen next month. Don't extrapolate beyond the analyzed window.
  • Not a ranking analyzer. It doesn't pull SERPs, doesn't track positions, doesn't tell you who outranked you.
  • Not a replacement for GSC's UI. Use it after GSC tells you "site impressions are down" to find which URLs to focus on, not before.
  • Not a real-time monitor. It runs on demand against a CSV. There is no scheduled job, no alerts.
  • Not BigQuery-aware. Some sites pipe GSC into BigQuery for richer slicing. Adapting this script to read a BQ extract is straightforward but isn't built in.

Design principles

The tool follows six principles, in roughly decreasing order of importance:

  1. Static-first. Everything runs in the browser against a JSON file. No backend means no auth, no hosting bills, no security surface, no operations toil. The build artifact is portable enough to email if needed.

  2. Two halves, one contract. The analysis script and the app communicate only through decay.json. Either side can be rewritten without touching the other: port the script to Python, swap the Vue app for plain HTML, etc. A formal TypeScript type defines the contract.

  3. Deterministic. Same CSV + same flags = identical JSON. No random sampling, no dependency on external APIs at analysis time, no time-of-day effects. Re-runs are reproducible across machines and across months.

  4. One number for ranking, three for understanding. The table sorts on a single number (decline % by default), but every row exposes slope, R², and lifetime so the user can see why a URL ranked where it did. Black-box composite scores hide judgment; this exposes the inputs.

  5. Honest math over flattering math. The 100% decline cap is honest about a URL going to zero, and the R² column is honest about how well the line actually fits. Neither inflates findings to make the tool look more useful than it is.

  6. No novelty for its own sake. Linear regression is over 200 years old. There are sexier methods (changepoint detection, Bayesian structural time series) that would surface marginally more nuance. They would also be harder to explain to a stakeholder when a URL surfaces unexpectedly. Boring tools age well.


Tech stack rationale

Layer Choice Why
Analysis Node + TypeScript Same language as the app, so one toolchain to install. Native streaming CSV with csv-parse, well-typed regression with simple-statistics. Pandas would be more idiomatic in Python but introduces a second runtime.
CSV parsing csv-parse Streaming, handles quoting/escaping correctly, well-maintained. Battery-included alternatives like papaparse are heavier and aimed at the browser.
Regression simple-statistics Tiny, dependency-free, returns slope/intercept directly. No need for a full stats framework.
App framework Vue 3 + Vite Simple reactivity model for a single-state app, fast HMR, small bundle. React + Next.js would be overkill for one page; Svelte was equally viable.
Charts Chart.js + vue-chartjs Dual-axis support, good defaults, ~50 KB gzipped. ECharts is more powerful but heavier; Recharts is React-only; D3 would require building everything from primitives.
Virtual scrolling vue-virtual-scroller Handles thousands of rows at 60fps without pagination. Native CSS content-visibility would be simpler but less reliable across browsers.
Type safety strict TypeScript on both sides The decay.json shape is shared via type definitions, so renaming a field at the script side surfaces compiler errors in the app.
Styling Plain CSS with custom properties No Tailwind, no UI kit. The app is small enough that a few hundred lines of hand-written CSS is more readable than a class soup. CSS variables make brand re-skinning a one-file change.
Type checking on build vue-tsc The npm run build script runs vue-tsc --noEmit before Vite, so type errors block the build.

Visual design system

The app's visual language is editorial: light mode, generous whitespace, hairline dividers, and a single warm gold accent. The intent is for the dashboard to feel like a printed report rather than a SaaS console.

Token Value Used for
--bg #ffffff Page background
--surface #fafbfc Subtle off-white panels (controls bar)
--surface-2 #f5f6f8 Hover row background
--ink #121212 Headings, primary text, active button background
--ink-soft #495057 Secondary text
--ink-muted #6c757d Metadata, table column headers
--gold #b58c54 Brand accent: focus rings, active sort, eyebrow rules, chart impressions line, selected-row tint
--decline #b91c1c Negative slope / decline % cells
--growth #15803d Positive slope / growth cells
--border #ededed Hairline dividers
--chart-impressions #b58c54 Impressions line on the chart
--chart-clicks #1f2937 Clicks line on the chart

Typography: Manrope from Google Fonts at weights 400, 500, 600, 700, 800. Manrope is a humanist geometric sans with sturdy lowercase letterforms, readable at small sizes and clean at display sizes. Tabular numerics (font-variant-numeric: tabular-nums) are enabled across the table for vertical digit alignment.

Hierarchy:

  • All-caps with 0.18em letter-spacing for category labels (eyebrows, axis titles)
  • 22px / 700 weight for the H1 with a −0.02em letter-spacing tightening
  • 13px / 400 weight for body (Manrope's normal weight); interactive elements (search, toggle, buttons, cells) bump to 500
  • 10px / 600 weight for table column headers (uppercase, 0.14em letter-spacing)

Density: the fixed elements above the table body (header ~80 px, controls bar ~64 px, chart 300 px, table head ~68 px) total roughly 515 px, leaving ~385 px for the table body on a 900 px viewport. At 38 px per row that's about 10 visible rows before scrolling. Raising the chart height eats into table density; reducing below ~220 px starts crowding the chart legend and axis labels.


Customizing

Brand colors

All colors are CSS variables defined at the top of app/src/style.css:

:root {
  --bg: #ffffff;
  --ink: #121212;
  --gold: #b58c54;          /* primary accent */
  --decline: #b91c1c;
  --growth: #15803d;
  --chart-impressions: #b58c54;
  --chart-clicks: #1f2937;
  /* ... */
}

Change those values and both the table and chart pick up the new palette. The chart reads its colors from these variables at runtime via getComputedStyle(), so style.css is the single source of truth.

Adding a column to the table

  1. Add a key to SortKey in app/src/types.ts.
  2. Add a case to getSortVal in App.vue that returns the value for the new column.
  3. Add a header entry to the headers array in DecayTable.vue and a matching <div class="cell"> in the row template.
  4. Update the cols grid template (in DecayTable.vue) to include a width for the new column, and update the spans on the group header row in DecayTable.vue if the column belongs to one of the existing groups.

Adjusting decline cap

The 100% display cap lives in two places:

  • Analysis: regress() in analysis/analyze.ts, which caps declinePct so it can't exceed 100% even for URLs whose model projects below zero.
  • Display: formatPct() in app/src/format.ts, which guards against any > 100 from old JSONs.

Removing the cap (not recommended) means URLs that decayed to zero will show declines like 144%, which is mathematically what the regression line predicts but is misleading on the surface.

Adjusting the chart height

In app/src/style.css:

.chart-wrap {
  height: 300px;
}

Increasing this reduces the number of table rows visible without scrolling. Decreasing this (below ~220px) starts crowding the legend and axis labels.


Glossary

  • Anchor month. The first month a URL had any impressions or clicks within the dataset's full-month range. The URL's regression series starts here.
  • Capped decline. The decline % displayed in the table, with a ceiling of 100% to prevent regression-projection artifacts.
  • Decline %. (modeled start − max(0, modeled end)) / modeled start × 100, computed from the regression's intercept and final fitted point.
  • Full month. A calendar month for which the dataset contains every day. Months at the dataset's boundaries that are partial are excluded from analysis.
  • Lifetime. Sum of impressions or clicks across the URL's analyzed window (anchor month through last full month). Not the URL's all-time total.
  • Linear regression. Least-squares fit of a straight line y = m·x + b through the URL's monthly traffic points, where x is the month index from anchor.
  • Qualified URL. A URL that meets the volume and history thresholds for at least one of impressions or clicks. Only qualified URLs appear in decay.json.
  • R² (coefficient of determination). Proportion of monthly variance explained by the fitted line. 1.0 = perfect fit, 0.0 = no linear trend.
  • Regression line. The line of best fit produced by the linear regression. Visualized in the chart as a dashed overlay.
  • Slope. Rate of change of the regression line, in units of impressions (or clicks) per month. Negative slope = decay.
  • Steady decline. Negative slope with high R² (typically > 0.5). The opposite of a sudden cliff.
  • Sudden drop / cliff. Negative slope with low R², driven by a single steep change rather than a gradual trend.

FAQ

Can I run this on Bing Webmaster Tools data? Yes, if you can produce a CSV with the same column names (Date, URL, Impressions, Clicks, CTR, Avg Rank). The script doesn't care which search engine generated it.

Can I use a date column other than Date (e.g. date, Day)? Not without editing the script. Column names are part of the contract; rename your CSV columns or change the four reads in analyze.ts.

Why are some URLs missing from the table? They didn't meet the volume or history thresholds. Default filters are ≥1,000 lifetime impressions, ≥6 months of data. Lower the thresholds with CLI flags to include more URLs, or check the analysis log for the qualified count.

A URL has been steadily declining for 18 months but only shows 6 months of data. Why? The CSV's date range determines the analysis window. The tool can't see beyond what you exported. Pull a longer range from GSC (12+ months recommended).

The decline % is 100% but the URL still gets ~50 impressions/month. Is the math wrong? The 100% reflects the regression line's projection (it predicts approximately zero), not the actual current value. Click the URL in the table to see the chart; the regression line is the dashed one, the actual data is the solid one.

Should I trust a URL with R² of 0.15 and a steep negative slope? No. Low R² means the line doesn't represent the data well. The slope is computed but it's not telling you anything about a "trend." Sort by R² descending to find URLs whose declines are actually trend-shaped.

Can I compare two CSVs (e.g., before/after a content refresh)? Not as a side-by-side view in the app. Run the analysis on each separately and compare the JSONs manually. Multi-dataset comparison would be a different tool.

Does this work for sites with millions of URLs? The analysis script handles ~400k rows in seconds. The app renders the table virtually, so URL count itself isn't the bottleneck. The constraint is the JSON: with hundreds of thousands of qualifying URLs the file gets large (10 MB+) and in-browser sort/filter starts to feel sluggish. For sites that big, raise --min-impressions to 5,000+ to focus on URLs that matter.

Why not show 95% confidence intervals or p-values? Adding statistical inference would suggest precision the data doesn't really support, given GSC's known reporting biases (privacy filtering, late-arriving data, query batching). The slope and R² are diagnostic, not inferential.

Can I run this in CI? Yes. The script is a pure CLI command. A scheduled workflow that pulls a fresh CSV and runs npm run analyze will produce a new decay.json deterministically. Combine with npm run build to publish a fresh static site.

Is GSC data accurate enough for this kind of analysis? GSC data has known caveats: privacy filtering, late updates, query batching. Month over month the noise is roughly stable, which is what trend detection actually needs. Structural reporting changes (such as GSC's 2023 reprocessing) are a different problem; if you suspect one inside your date range, trim it.


License

MIT. See LICENSE.

About

Identify long term decaying or growing URLs based on GSC data

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors