Historical sessions missing user_id after unified user model upgrade

## Context

The unified user model migration (Phases 1–6, #523) introduced canonical user tracking via `users` and `user_identities` tables. New sessions created after the upgrade automatically get a `user_id` via `resolveOrCreateUser`. However, **sessions created before the upgrade have `user_id = NULL`** and are grouped by `scm_login` in analytics instead of the canonical user identity.

This causes the same person to appear as **two entries** in the Per-User Breakdown on the analytics page — one from historical sessions (keyed by `scm_login`) and one from new sessions (keyed by canonical `user_id`).

## Impact

- **Analytics only** — session functionality is unaffected
- **Temporary** — historical sessions age out of the analytics window (7/14/30/90 days)
- **Cosmetic** — the duplicate entries show the same person with slightly different labels (e.g., GitHub login vs. display name)

## Workaround

After upgrading, each user's first new session (from any source — web, Slack, GitHub, Linear) creates their canonical user record in the `users` table. Once that exists, you can backfill historical sessions with a targeted D1 query.

### Step 1: Find canonical user IDs

```bash
npx wrangler d1 execute <database-name> --remote \
  --command "SELECT u.id, u.display_name, ui.provider_login FROM users u JOIN user_identities ui ON u.id = ui.user_id WHERE ui.provider = 'github'"
```

### Step 2: Backfill each user's historical sessions

For each user, link their old sessions by `scm_login`:

```bash
npx wrangler d1 execute <database-name> --remote \
  --command "UPDATE sessions SET user_id = '<user_id>' WHERE scm_login = '<github_login>' AND user_id IS NULL"
```

### Step 3: Verify

```bash
npx wrangler d1 execute <database-name> --remote \
  --command "SELECT COUNT(*) total, COUNT(user_id) linked, COUNT(*)-COUNT(user_id) unlinked FROM sessions WHERE spawn_source IN ('user','slack-bot','linear-bot','github-bot')"
```

## Why not an automatic migration?

A generic backfill migration was drafted (PR #554) but removed because it could not safely match historical `scm_login` values to canonical users without risking identity fragmentation. The `scm_login` field stores GitHub usernames, but canonical users are keyed by numeric GitHub user ID — and the user's `display_name` may differ from their login. A migration that creates users from `scm_login` values could produce duplicate user records for the same person.

The manual approach above is safe because it only runs *after* canonical users exist (from new sessions) and uses the known `user_id` directly.

## Related

- #523 — Unified user model epic
- #554 — Phase 6 PR (frontend + sentinel cleanup, backfill removed)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Historical sessions missing user_id after unified user model upgrade #555

Context

Impact

Workaround

Step 1: Find canonical user IDs

Step 2: Backfill each user's historical sessions

Step 3: Verify

Why not an automatic migration?

Related

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Historical sessions missing user_id after unified user model upgrade #555

Description

Context

Impact

Workaround

Step 1: Find canonical user IDs

Step 2: Backfill each user's historical sessions

Step 3: Verify

Why not an automatic migration?

Related

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions