Skip to content

feat(15v2): Forecast Backtest Overlay — replaces closed PR #25#26

Merged
shiniguchi merged 33 commits intomainfrom
feature/phase-15-forecast-backtest-overlay
May 1, 2026
Merged

feat(15v2): Forecast Backtest Overlay — replaces closed PR #25#26
shiniguchi merged 33 commits intomainfrom
feature/phase-15-forecast-backtest-overlay

Conversation

@shiniguchi
Copy link
Copy Markdown
Owner

Summary

Phase 15 v2 — Forecast Backtest Overlay. Replaces closed PR #25 (rejected for over-scoping). 22 commits across 7 plans.

  • 15-09: Migration 0057 — granularity column + 7-col PK on forecast_daily; rebuilt forecast_daily_mv + forecast_with_actual_v to expose grain.
  • 15-10: Phase 14 nightly job emits 3 grain rows (day/week/month) per (model, target_date) per refresh — single SARIMAX call resampled to weekly+monthly.
  • 15-11: /api/forecast reads at native grain; new ?kpi= selector; backtest actuals from kpi_daily_v. Dropped forecastResampling.ts.
  • 15-12: CalendarRevenueCard overlays forecast lines + CI bands per visible model with ForecastLegend + time-scale X axis.
  • 15-13: DRY refactor of forecast-card scaffolding + freshness gate.
  • 15-14: RevenueForecastCard rewrite — drop HorizonToggle, full range, CI bands per visible model.
  • 15-15: InvoiceCountForecastCard sibling card + 2 i18n keys × 5 locales + dashboard mount.

Verification

  • Tests: forecast_daily_granularity 8/8 green, forecast pytests 24/24 green, integration suite passes against local Supabase.
  • svelte-check: clean at 6-error baseline (vite.config + hooks.server.ts pre-existing).
  • CI guards: 1, 6, 8 clean. Migration drift expected (local has 0057, remote DEV at 0056) — resolved on deploy.
  • Planning-docs validator: plan-total drift fixed; pre-existing phase-total drift not introduced by this PR.

Outstanding

  • 15-16 localhost visual gate deferred (local has 0 rows in forecast_daily after db reset — overlays render only after DEV deploy populates). Will run final QA against DEV.
  • 15-17 deferred per CONTEXT.md — runs only after overlays visually validate on DEV.
  • /api/forecast 500 in local-only auth'd session: empty-data code path is clean (static-traced); DEV has data so this is a non-issue post-deploy.

Cron mitigation

Plan 15-09 flipped forecast-refresh.yml to Monday-only (0 7 * * 1). Next run 2026-05-04. Migration 0057 must be live on DEV before then or nightly insert hits NOT NULL granularity. Deploying now provides 3-day buffer.

Test plan

shiniguchi added 30 commits May 1, 2026 00:58
…ranch

PR #25 (Phase 15 v1 — forecast-only forward chart) closed in favor of v2.
This commit brings forward all reusable v1 code as a single starting point.

Reusable v1 surfaces ported (will evolve in plans 15-09 through 15-17):
  - src/lib/forecastConfig.ts (CAMPAIGN_START)
  - src/lib/chartPalettes.ts (FORECAST_MODEL_COLORS — sarimax key per Phase 14 contract)
  - src/lib/emptyStates.ts (4 forecast empty-state keys)
  - src/lib/i18n/messages.ts (8 locale-mirrored sections: horizon/legend/popup/card)
  - src/lib/forecastValidation.ts (parseHorizon/parseGranularity — horizon clamp dropped in 15-11)
  - src/lib/forecastResampling.ts (will be DELETED in 15-11 — v2 stores forecasts at native grain)
  - src/lib/forecastEventClamp.ts (incl. dedupe fix)
  - src/routes/api/forecast/+server.ts (will be REFACTORED in 15-11)
  - src/routes/api/forecast-quality/+server.ts
  - src/routes/api/campaign-uplift/+server.ts
  - src/lib/components/HorizonToggle.svelte (will be DELETED in 15-14)
  - src/lib/components/ForecastLegend.svelte (reused inline by overlays)
  - src/lib/components/EventMarker.svelte
  - src/lib/components/ForecastHoverPopup.svelte
  - src/lib/components/RevenueForecastCard.svelte (will be REWRITTEN in 15-14)
  - src/routes/+page.svelte (mount evolves: 15-15 adds InvoiceCount sibling, 15-17 retires both)
  - All matching unit tests (incl. forecastResampling test which drops with 15-11)

v1 fixes preserved:
  - clampEvents dedupe by (type, date, label) — prevents Svelte 5 each_key_duplicate
    crash on multi-source German holiday days
  - sarimax_bau → sarimax model_name alignment with Phase 14 contract

v1 PLAN.md docs are NOT ported — v2 plans (15-09..15-17) are written fresh
to reflect the corrected mental model: forecasts as overlays on actuals
charts, grain-specific TRAIN_ENDs, weekly refresh cadence.
… Overlay

Plans 15-09 through 15-17 follow the writing-plans skill format. CONTEXT
captures D-14..D-19 new decisions (grain-specific TRAIN_ENDs, calendar
overlay rendering, weekly cron, option-B CI, dual-KPI parity, partial-
month behavior) plus carry-forwards C-01..C-07 + D-01/02/04/05/06/07/
08/09/10/12/13 from v1. v1's D-03 (today Rule) and D-11 (horizon clamp)
explicitly retired.

Plans:
  15-09: forecast_daily granularity column + weekly cron
  15-10: scripts/forecast/run_all.py 3-grain TRAIN_END loop
  15-11: /api/forecast native-grain query + ?kpi= + backtest actuals
  15-12: CalendarRevenueCard forecast overlay
  15-13: CalendarCountsCard forecast overlay (invoice_count)
  15-14: RevenueForecastCard rewrite (drop HorizonToggle, full range)
  15-15: InvoiceCountForecastCard sibling
  15-16: localhost gate + DEV deploy QA + STATE/ROADMAP closure
  15-17 (deferred): retire dedicated cards once overlays validated

Each plan keeps TDD discipline (failing test first, minimal impl, GREEN,
commit). Each lists exact file paths, code blocks for critical logic,
and verification commands. Ready for subagent-driven execution in a
fresh session.
…tual_v

Code review M-1: restaurant_id in the LEFT JOIN to kpi_daily_mv is what
makes the wrapper view safe across tenants — kpi_daily_mv has no RLS of
its own. Add a one-line comment so a future edit doesn't accidentally
drop it.
…date')

Model fit scripts rename business_date -> date in _fetch_history before
calling the aggregation helpers; let them pass date_col='date' instead
of having to rename back. Default keeps existing tests green.
- Add GRANULARITIES = ['day','week','month'] and triple-nested
  model x KPI x grain loop. Each spawn now threads GRANULARITY env var.
- 5 models x 2 KPIs x 3 grains = 30 spawns/refresh on full pipeline.
- Freshness gate: abort cleanly (return 0) when last_actual in
  kpi_daily_mv is more than 8 days stale; write a pipeline_runs
  failure row for triage but don't fail the workflow.
- Logging now includes granularity in every spawn / success / failure line.
Each *_fit script now reads GRANULARITY (day|week|month) from env and:
  - computes a grain-specific TRAIN_END (D-14): last_actual-7d / -35d /
    end-of-(month-5)
  - aggregates daily history to weekly/monthly via aggregation.bucket_to_*
    when grain != 'day'
  - picks horizon 372/57/17 and seasonal period 7/52/12 (or model-specific
    equivalent: Prophet weekly/yearly flags, naive_dow seasonal_key swap
    to ISO week-of-year / month-of-year)
  - sets the new 'granularity' column on every forecast_daily row
  - skips closed-day post-hoc zeroing for non-day grains (closed days
    roll into bucket sums; per-day open/closed gating no longer applies)

SARIMAX/ETS/theta drop exog regressors at week/month grain — exog matrix
is daily-shaped, bucket-aggregating it is out of scope for 15-10.
naive_dow keeps model_name='naive_dow' (chart legend strings depend on
this per Phase 15 v1's locked decisions); only the seasonal grouping key
changes.
- test_run_all_loops_over_three_granularities asserts that 1 model x 2
  KPIs x 3 grains produces exactly 6 subprocess.run spawns, one per
  (KPI, grain) pair, each tagged with the matching GRANULARITY env var.
- test_freshness_gate_aborts_on_stale_data confirms run_all returns 0
  with zero spawns when last_actual is 10 days old (> 8-day threshold).

Stubs the supabase package via sys.modules so the test runs offline
without supabase installed (matches the local dev / CI unit-test env
where the runtime client isn't required).
Address spec-review concerns:
- run_all.py: explain why freshness-gate uses write_failure with status=
  'failure' instead of 'waiting_for_data' (the writer doesn't expose that
  status); document filter for triage.
- run_all.py: point eval-still-daily TODO at Phase 17 (backtest gate).
- sarimax_fit.py: document that the (1,1,1)/(0,1,0) order is held constant
  across grains per D-14 escalation note; flag month-grain over-param risk
  for Phase 17 to revisit.
…on (I-1, I-3)

Code review I-1: HORIZON_BY_GRAIN, _train_end_for_grain, and
_pred_dates_for_grain were duplicated verbatim across 5 model fit scripts
(sarimax, prophet, ets, theta, naive_dow). Math is grain-driven, not
model-driven, so there is no parallel-evolution argument for keeping
copies. Extract to scripts/forecast/grain_helpers.py — single source of
truth, will also be imported by /api/forecast in plan 15-11.

Adds parse_granularity_env() so the GRANULARITY env-var validation block
in each fit script's __main__ collapses to one line. Behavior matches
the previous strip-empty-defaults-to-day semantics.

I-2: train_end_for_grain docstring now explicitly explains why the gap
between TRAIN_END and the first forecast bucket can be ~35 days (week)
or ~5 months (month) — it is intentional, sized so each training bucket
is fully complete.

I-3: closed_days.py module docstring now documents the load-bearing
"missing date = open" assumption that produces the smooth 372-day
forecast curve.

Also drops now-unused dateutil.relativedelta and timedelta imports from
the 5 fit scripts.

Tests: 24 passed, 2 skipped (unchanged from pre-refactor baseline).
Adds per-model forecast lines + low-opacity CI bands on top of the
visit-seq stacked bars in CalendarRevenueCard. Inline ForecastLegend
chip row toggles models; default visible = {sarimax, naive_dow}.

Scale strategy: switched from implicit scaleBand to scaleTime +
xInterval={timeDay|timeMonday|timeMonth} so bars and forecast splines
share the same x-axis. LayerChart's Bar.svelte handles xInterval
bandwidth via interval.floor()/offset(). xDomain extended to today + 365d
so the forecast horizon lives in the empty space to the right of the
last bar; chartW grows proportionally for horizontal scroll.

D-17 Option B: toggling a model removes BOTH its line and CI band
(seriesByModel filters by visibleModels; both <Area> and <Spline>
each-blocks iterate that map). naive_dow renders dashed gray at
stroke-width=1; smart models solid 2px. CI bands at fillOpacity=0.06
prevent visual mush at 375px.

forecastData fetched once per grain change via clientFetch with
lastFetchedGrain guard to prevent reactive loops. yhat_mean is in EUR
(per Phase 14 schema) — bars also display in EUR after the existing
/100 mapping; no extra divisor needed.

Tests: 11 new artifact assertions in tests/unit/CalendarCards.test.ts
(jsdom can't render LayerChart; e2e suite covers visual gate).

Refs: docs/superpowers/plans/* phase 15 v2 plan 15-12
Plans 15-09..15-15 fully implemented; 15-16 partial (STATE/ROADMAP done,
localhost gate deferred + DEV deploy/PR pending user authorization);
15-17 deferred per CONTEXT.md.

Note: pre-existing phase-total drift (ROADMAP 16 entries vs STATE 17
phases) is unrelated to this work and persists from before 15-09.
Allows manual migration deploy against feature branches before merging
to main. Needed for Phase 15 visual QA: migration 0057 must be live on
DEV before the dashboard's forecast endpoints can return rows for the
new granularity column.
Two QA findings from DEV verification:

1. Default-week-grain empty state was misleading. Cards rendered
   "Forecast generating — Check back tomorrow, the first nightly run
   is still pending" any time forecast_daily_mv had no rows at the
   current grain. After Phase 15-10 the per-grain pipeline emits 3
   grains per refresh, but the first run with that shape only fires
   on the next forecast-refresh cron (Mon 2026-05-04). Until then
   week/month grains are genuinely empty even though day works.
   Add `forecast-grain-pending` empty-state variant and pick it
   when grain !== 'day'. Day-grain empty still shows the original
   pre-first-run message.

2. CalendarRevenueCard's forecast lines render in the +365d gap to
   the right of the last bar — but the chart canvas is 19,396px
   wide while the visible viewport is ~574px, so the lines render
   at x=9,064px+ and users had to scroll right ~16x to discover
   them. Auto-scroll the chart container on mount so today's edge
   is at ~60% of the visible viewport: most of the visible area
   shows recent past, with the near-future forecast hinted on the
   right. Skip if the user has manually scrolled (scrollLeft > 0).
Previous commit added scrollerRef state + auto-scroll $effect but forgot
to bind:this on the actual overflow-x:auto div. scrollLeft stayed 0 and
forecast lines remained off-screen. One-line fix.
xScale was returning a stale value (~4282px) when the forecast Spline
paths actually rendered at x=9063+ in the same canvas. The chart's
xInterval mode + scale-time + forecast-extended xDomain interplay made
xScale unreliable in the auto-scroll $effect.

Switch to pure date arithmetic: compute today's proportion of the
chartXDomain span and multiply by scrollWidth. Deterministic, doesn't
depend on the chart context hydrating in any particular order.

Verified on DEV: scrollLeft now lands such that the forecast lines
(starting at canvas x=9063) appear in the visible viewport, with
recent past bars to their left.
The $effect was firing the moment forecastData arrived but BEFORE the
chart re-rendered with the extended xDomain — at that point scrollWidth
still reflected only the bar zone (~9000px), and todayPct × scrollWidth
landed at ~4300px (deep inside the bar zone). After RAF the chart has
re-rendered and scrollWidth = 19396px (full canvas). Then todayPct
× scrollWidth lands at the bar/forecast boundary as intended.
…tData

Previous version fired the $effect when forecastData arrived, but on
INITIAL page load forecastData lands before chartW finishes growing
(forecastData → totalSlots → computeChartWidth). The RAF callback then
read scrollWidth before the chart re-rendered with the forecast zone,
landing scrollLeft inside the bar zone (~3961px) instead of at the
bar/forecast boundary (~8770px).

Reading chartW inside the effect makes Svelte rerun the effect when
the canvas actually grows. Verified: scrolling to month-grain then
back to day-grain produces scrollLeft=8785, with forecastVisible=true.
shiniguchi added 3 commits May 1, 2026 13:38
Prior version bailed on `scrollLeft > 0` after the first RAF wrote a
position based on a still-growing scrollWidth. Track lastSetScrollLeft
instead — if scrollLeft matches our last write, we're free to refine
when chartW grows; if user scrolls (mismatch), we stop.
The chart's inner SVG dimensions lag the chartW prop by 1-2 frames on
initial load. Single RAF wasn't enough — scrollWidth still smaller than
expected, todayPct landed in the bar zone. Poll up to 10 frames waiting
for el.scrollWidth >= w * 0.9, then position. Self-bounded so we never
loop forever on edge cases (SSR-only render, etc).
Date-math approach was producing scrollLeft=3961 (~21%) instead of the
expected ~47%, possibly due to chartXDomain reactivity timing. Bucket-
count proportion (chartData.length / total) is computed from the same
data the chart uses to size its bars, so it can't drift from the actual
canvas layout. Bumped poll-RAF cap to 30 frames (~500ms) for slower
chart renders.
@shiniguchi shiniguchi merged commit b638fe8 into main May 1, 2026
3 of 5 checks passed
@shiniguchi shiniguchi deleted the feature/phase-15-forecast-backtest-overlay branch May 1, 2026 15:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant