Skip to content

feat(orchestrator): add Background Job Board guidance to base prompt#487

Open
stevekstevek wants to merge 38 commits into
alvinunreal:v2-betafrom
stevekstevek:v2-beta
Open

feat(orchestrator): add Background Job Board guidance to base prompt#487
stevekstevek wants to merge 38 commits into
alvinunreal:v2-betafrom
stevekstevek:v2-beta

Conversation

@stevekstevek
Copy link
Copy Markdown

What changed

Added a ### Background Job Board section to buildOrchestratorPrompt() and a session-reuse warning to ### Session Reuse.

Why

Testing after the [just launched]/[resumed] label patch (#486) revealed two additional failure modes:

1. Autoregressive ID regeneration
The model rewrites long session IDs from memory when recalling them for subsequent task_status calls, introducing character-level errors (e.g., ObAb). The model's own explanation of the typo was wrong — it didn't understand what it had done. Fix: instruct the model to use short aliases (lib-1) for reasoning and copy full IDs verbatim from tool outputs only when tools require them.

2. Dual-context confusion
The Resumable Sessions context section shows the same session IDs as the job board but without temporal labels. When a model sees the same ID in two places — one with [just launched], one without a timestamp — it resolves the contradiction by treating the task as pre-existing. The prompt now explicitly explains that these two sections serve different purposes.

3. Session reuse requires explicit action
Testing showed the model intending to reuse sessions but dispatching fresh ones because it didn't pass task_id. Added explicit rule: intent alone doesn't reuse a session.

Relationship to #486

This prompt section references the [just launched] and [resumed] labels introduced in #486. The two changes are complementary: the code creates the signal; the prompt teaches the model how to interpret it.

alvinunreal and others added 8 commits May 17, 2026 18:56
… label

Add `lastLaunchedAt` field to BackgroundJobRecord. On first launch,
lastLaunchedAt equals launchedAt. On session reuse (same taskID relaunched),
lastLaunchedAt is updated to now while launchedAt is preserved.

formatJob uses lastLaunchedAt for the age calculation and compares it to
launchedAt to pick the right label:
  - new session  → `running [just launched, Xs ago]`
  - reused session → `running [resumed, Xs ago]`
  - older than 30s → no annotation (unchanged)

Fixes the Type 1 confusion where an orchestrator cannot tell whether a
running job in the board was just dispatched or is a pre-existing session
it decided to reuse.
Add a new '### Background Job Board' section to buildOrchestratorPrompt()
documenting how to read status labels, use short aliases, avoid reconstructing
session IDs from memory, and understand the job board vs resumable_sessions
distinction. Also adds explicit session-reuse warning to '### Session Reuse'.

Addresses two bugs observed in testing:
1. Autoregressive ID regeneration — model rewrites long session IDs from memory,
   introducing character errors. Fix: use aliases for reasoning, copy full IDs
   verbatim from tool outputs only when needed.
2. Dual-context confusion — the Resumable Sessions context section shows the same
   session IDs without temporal labels, causing the model to override the job board
   temporal signal. Fix: explicitly explain the two sections serve different purposes.

Works in tandem with the formatJob() patch (previous commit) which creates the
[just launched] / [resumed] labels this prompt section references.
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 17, 2026

Greptile Summary

This PR extends buildOrchestratorPrompt() with two additions: a session-reuse rule clarifying that task_id must be explicitly passed to avoid creating a fresh session, and a new ### Background Job Board section teaching the model how to interpret alias-based task tracking and the [just launched]/[resumed] labels introduced in #486.

  • The session-reuse rule is a clear, correct addition that closes the intent-vs-execution gap documented in the PR description.
  • The new ### Background Job Board section uses the same markdown heading as the one dynamically emitted by BackgroundJobBoard.formatForPrompt(), which is injected into every orchestrator user-message turn at runtime — resulting in two same-named sections visible simultaneously, with subtly different instructions.
  • The status-line legend covers completed, unreconciled but omits error, unreconciled, cancelled, unreconciled, and <state>, timed out, all of which are valid outputs from formatJob().

Confidence Score: 3/5

The change is prompt-only and carries no risk of data loss or runtime crashes, but the duplicate heading creates a real coherence problem when active background jobs are present.

The session-reuse line is safe and correct. The new Background Job Board section, however, shares an identical heading with the live job data injected into every orchestrator user-message turn, so at runtime the model sees two ### Background Job Board headings with different (and somewhat conflicting) instructions. This is a present, reproducible issue on any turn with active background jobs. The legend is also incomplete for error and timed-out statuses the job board regularly produces.

src/agents/orchestrator.ts — the new static section and how it interacts with the dynamic injection in src/hooks/task-session-manager/index.ts:585 and src/utils/background-job-board.ts:194.

Important Files Changed

Filename Overview
src/agents/orchestrator.ts Adds static ### Background Job Board guidance and a session-reuse rule to the orchestrator system prompt. The guidance content is accurate but the section heading duplicates the one emitted by BackgroundJobBoard.formatForPrompt(), which the runtime injects into every orchestrator user-message turn — creating two same-named sections with conflicting instructions visible simultaneously.

Reviews (1): Last reviewed commit: "feat(orchestrator): add Background Job B..." | Re-trigger Greptile

Comment on lines +223 to +233
### Background Job Board

Each tracked task has a **short alias** (\`lib-1\`, \`exp-2\`, \`fix-3\`). Use the alias in your reasoning — never rewrite a full session ID from memory. Only use the full session ID when a tool requires it, and **always copy it verbatim from the tool output**.

**Reading the status line:**
- \`running [just launched, Xs ago]\` — you dispatched this in the **current turn**. It is NOT pre-existing.
- \`running [resumed, Xs ago]\` — you continued this session in the **current turn** (passed \`task_id\`). It is NOT pre-existing.
- \`running\` (no label) — was already running **before your last dispatch turn**. Poll with \`task_status\` before acting on it.
- \`completed, unreconciled\` — finished, result not yet read.

**Job board vs. Resumable Sessions:** The job board shows current task state. The Resumable Sessions section lists sessions that *can* be reused — presence there does not mean currently running.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Duplicate heading collides with dynamic job board injection

BackgroundJobBoard.formatForPrompt() already emits its own ### Background Job Board heading (with live job rows) and injects it into every orchestrator user-message turn via task-session-manager/index.ts:585. When active jobs exist the model therefore sees two distinct ### Background Job Board sections — one in the system prompt (instructions, no data) and one in the injected user-message context (abbreviated instruction + actual job rows). Because the headings are identical, the model may conflate them and either look for live job data inside the system-prompt section (finding none) or treat the two differing instructions ("Use the alias in your reasoning" vs. "Use task_status before consuming running jobs") as contradictory directives scoped to the same section. Renaming the static section — e.g. ### Background Job Board — Usage Guide — would disambiguate.

- \`running [just launched, Xs ago]\` — you dispatched this in the **current turn**. It is NOT pre-existing.
- \`running [resumed, Xs ago]\` — you continued this session in the **current turn** (passed \`task_id\`). It is NOT pre-existing.
- \`running\` (no label) — was already running **before your last dispatch turn**. Poll with \`task_status\` before acting on it.
- \`completed, unreconciled\` — finished, result not yet read.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Incomplete status-line coverage leaves timed-out and non-completed terminal states unexplained

BackgroundJobBoard.formatJob() emits three distinct status patterns: ${state}, unreconciled for any terminal-unreconciled job (so error, unreconciled and cancelled, unreconciled are equally possible), ${state}, timed out when timedOut is set (e.g. running, timed out), and the plain running / running [label] entries. The guidance here only documents completed, unreconciled, so the model has no instruction for how to treat error, unreconciled, cancelled, unreconciled, or running, timed out entries it will routinely encounter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants