Context
The unified user model migration (Phases 1–6, #523) introduced canonical user tracking via users and user_identities tables. New sessions created after the upgrade automatically get a user_id via resolveOrCreateUser. However, sessions created before the upgrade have user_id = NULL and are grouped by scm_login in analytics instead of the canonical user identity.
This causes the same person to appear as two entries in the Per-User Breakdown on the analytics page — one from historical sessions (keyed by scm_login) and one from new sessions (keyed by canonical user_id).
Impact
- Analytics only — session functionality is unaffected
- Temporary — historical sessions age out of the analytics window (7/14/30/90 days)
- Cosmetic — the duplicate entries show the same person with slightly different labels (e.g., GitHub login vs. display name)
Workaround
After upgrading, each user's first new session (from any source — web, Slack, GitHub, Linear) creates their canonical user record in the users table. Once that exists, you can backfill historical sessions with a targeted D1 query.
Step 1: Find canonical user IDs
npx wrangler d1 execute <database-name> --remote \
--command "SELECT u.id, u.display_name, ui.provider_login FROM users u JOIN user_identities ui ON u.id = ui.user_id WHERE ui.provider = 'github'"
Step 2: Backfill each user's historical sessions
For each user, link their old sessions by scm_login:
npx wrangler d1 execute <database-name> --remote \
--command "UPDATE sessions SET user_id = '<user_id>' WHERE scm_login = '<github_login>' AND user_id IS NULL"
Step 3: Verify
npx wrangler d1 execute <database-name> --remote \
--command "SELECT COUNT(*) total, COUNT(user_id) linked, COUNT(*)-COUNT(user_id) unlinked FROM sessions WHERE spawn_source IN ('user','slack-bot','linear-bot','github-bot')"
Why not an automatic migration?
A generic backfill migration was drafted (PR #554) but removed because it could not safely match historical scm_login values to canonical users without risking identity fragmentation. The scm_login field stores GitHub usernames, but canonical users are keyed by numeric GitHub user ID — and the user's display_name may differ from their login. A migration that creates users from scm_login values could produce duplicate user records for the same person.
The manual approach above is safe because it only runs after canonical users exist (from new sessions) and uses the known user_id directly.
Related
Context
The unified user model migration (Phases 1–6, #523) introduced canonical user tracking via
usersanduser_identitiestables. New sessions created after the upgrade automatically get auser_idviaresolveOrCreateUser. However, sessions created before the upgrade haveuser_id = NULLand are grouped byscm_loginin analytics instead of the canonical user identity.This causes the same person to appear as two entries in the Per-User Breakdown on the analytics page — one from historical sessions (keyed by
scm_login) and one from new sessions (keyed by canonicaluser_id).Impact
Workaround
After upgrading, each user's first new session (from any source — web, Slack, GitHub, Linear) creates their canonical user record in the
userstable. Once that exists, you can backfill historical sessions with a targeted D1 query.Step 1: Find canonical user IDs
Step 2: Backfill each user's historical sessions
For each user, link their old sessions by
scm_login:Step 3: Verify
Why not an automatic migration?
A generic backfill migration was drafted (PR #554) but removed because it could not safely match historical
scm_loginvalues to canonical users without risking identity fragmentation. Thescm_loginfield stores GitHub usernames, but canonical users are keyed by numeric GitHub user ID — and the user'sdisplay_namemay differ from their login. A migration that creates users fromscm_loginvalues could produce duplicate user records for the same person.The manual approach above is safe because it only runs after canonical users exist (from new sessions) and uses the known
user_iddirectly.Related