Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 17 additions & 8 deletions docs/providers/copilot.md
Original file line number Diff line number Diff line change
@@ -1,23 +1,25 @@
# Copilot

GitHub Copilot Chat (CLI and VS Code extension transcripts).
GitHub Copilot Chat (CLI, VS Code core chat sessions, and VS Code extension transcripts).

- **Source:** `src/providers/copilot.ts`
- **Loading:** eager (`src/providers/index.ts:3`)
- **Test:** `tests/providers/copilot.test.ts` (401 lines)
- **Test:** `tests/providers/copilot.test.ts`

## Where it reads from

Two JSONL locations plus an optional OpenTelemetry SQLite source (see below). All
discovered sources are walked on every run; results merge and dedupe.
Three JSONL locations plus an optional OpenTelemetry SQLite source (see below). OTel is
preferred when present; chatSessions are only discovered when no OTel source is found.
Other discovered sources are walked on every run; results merge and dedupe.

1. **Legacy CLI sessions:** `~/.copilot/session-state/`
2. **VS Code transcripts:** `~/Library/Application Support/Code/User/workspaceStorage/<hash>/GitHub.copilot-chat/transcripts/` and equivalents on Windows / Linux
3. **OTel SQLite store:** VS Code Copilot Chat's `agent-traces.db` (see the OTel section). Preferred when present because it carries full input / output / cache token counts; the JSONL sources only record output tokens.
2. **VS Code core chat sessions:** `~/Library/Application Support/Code/User/workspaceStorage/<hash>/chatSessions/*.jsonl` plus `~/Library/Application Support/Code/User/globalStorage/emptyWindowChatSessions/*.jsonl` and equivalents on Windows / Linux
3. **VS Code transcripts:** `~/Library/Application Support/Code/User/workspaceStorage/<hash>/GitHub.copilot-chat/transcripts/` and equivalents on Windows / Linux
4. **OTel SQLite store:** VS Code Copilot Chat's `agent-traces.db` (see the OTel section). Preferred when present because it carries full input / output / cache token counts; legacy JSONL sources only record output tokens.

## Storage format

JSONL in the first two locations (schemas differ; the parser switches by detecting which schema the first event uses), and a SQLite DB for the OTel source.
JSONL in the first three locations (schemas differ; the parser switches by source type / event shape), and a SQLite DB for the OTel source. VS Code core chat sessions use a delta journal: `kind:0` sets the root object, `kind:1` writes a value at path `k`, and `kind:2` appends items to an array path.

## OpenTelemetry (OTel) source

Expand All @@ -26,6 +28,11 @@ breakdowns (input, output, cache-read, cache-creation) from it, which the JSONL
not record. Discovery is skipped with `CODEBURN_COPILOT_DISABLE_OTEL=1`, and the DB path
can be overridden with `CODEBURN_COPILOT_OTEL_DB`.

If OTel discovery finds at least one source, workspace `chatSessions/*.jsonl` and
`emptyWindowChatSessions/*.jsonl` are skipped. Those journals can mirror the same Copilot
turns under IDs that do not match OTel turn IDs, so CodeBurn prefers the richer OTel data
instead of trying to dedupe across stores.

- **Requires Node 22+.** The OTel source uses the built-in `node:sqlite` module (the same
backend as Cursor / OpenCode). On Node 20, or if the DB is missing / locked / corrupt /
wrong-schema, OTel is skipped and the JSONL/transcript sources are used as a fallback.
Expand All @@ -43,7 +50,9 @@ None for the JSONL sources. The OTel source uses a durable cache (see above).

## Deduplication

Per `messageId` in both formats (`copilot.ts:118` for legacy, `copilot.ts:245` for transcripts).
Legacy JSONL and transcript sessions dedupe per `messageId`. Core chat sessions dedupe per `copilot-chatsession:<sessionId>:<requestId>`, and are not discovered when an OTel source is present.

If a workspace hash contains at least one `chatSessions/*.jsonl` file, the provider skips that hash's legacy `GitHub.copilot-chat/transcripts/` directory. The core chat session journal is the modern token-bearing source for the same conversations, so reading both would inflate call counts.

## Model inference

Expand Down
Loading
Loading