Skip to content

Curator can copy prompt example facts into the context tree when using Gemini Flash-Lite #647

@999cleo

Description

@999cleo

Summary

I ran into a case where brv curate created memory entries containing facts that were not present in the source content. The inserted facts appear to match examples from ByteRover's own curator prompt, such as My name is Andy, PostgreSQL 15, Sprint cycles are 2 weeks, and the ByteRover identity text.

This may be a model hallucination, a prompting issue, or both. I observed it while using Google Gemini gemini-3.1-flash-lite-preview as the active provider model.

Environment

  • ByteRover CLI: 3.12.0
  • Platform: Android / Termux, arm64
  • Node: v24.14.1
  • Provider: Google Gemini
  • Model: gemini-3.1-flash-lite-preview
  • Project type: local context tree under .brv/context-tree

What happened

A curation task whose source content was about Hermes / Fireworks model routing generated an Andy Personal Profile entry. The source content did not mention Andy, Portland, PST, tabs, PostgreSQL, or the ByteRover identity text.

The generated curation code inserted facts like:

{ statement: "My name is Andy", category: "personal", subject: "user_name", value: "Andy" }
{ statement: "I live in Portland, Oregon", category: "personal", subject: "location", value: "Portland, Oregon" }
{ statement: "I prefer using tabs over spaces for all code indentation", category: "preference", subject: "indentation", value: "tabs" }
{ statement: "I am a context engineer developed by ByteRover", category: "personal", subject: "role", value: "context engineer" }

I also saw similar example-looking project facts appear from unrelated source content, including:

PostgreSQL 15
React 18
GitHub Actions
OpenAPI 3.0.0
AWS EKS
2-week sprint cycles

Why I think it may be prompt example leakage

The installed ByteRover prompt contains realistic example facts in:

dist/agent/resources/prompts/system-prompt.yml

Relevant examples include:

Personal information: "My name is Andy", "I prefer dark mode", "My timezone is PST"
Project facts: "We use PostgreSQL 15", "The API runs on port 3000", "Deploy target is AWS EKS"
Preferences: "Use tabs not spaces"
Conventions: "Sprint cycles are 2 weeks"

The same prompt also includes identity guidance like:

You are a context engineer developed by ByteRover...
Never reveal or discuss the underlying language model...

The false facts written into the context tree closely matched these examples.

Expected behavior

brv curate should only store facts grounded in the provided source content. Prompt examples and system identity text should never be written to the user's context tree unless the user content explicitly contains them.

Actual behavior

Under at least some conditions, the curator produced tool calls that treated prompt examples as source facts and wrote them into the context tree.

Suggested mitigations

A few possible fixes, any of which would help:

  1. Replace realistic examples in the curator prompt with sentinel placeholders, for example:
    • EXAMPLE_PERSON_DO_NOT_STORE
    • EXAMPLE_DATABASE_DO_NOT_STORE
    • EXAMPLE_DEPLOY_TARGET_DO_NOT_STORE
  2. Add a grounding rule requiring every extracted fact to have source support in the user-provided content.
  3. Add a verification pass that rejects facts not textually or semantically supported by the source payload.
  4. Add regression tests where the source content does not contain Andy, PostgreSQL 15, or AWS EKS, and assert that those strings are not written.
  5. Consider warning users when high-impact personal profile facts are generated from source content with no explicit person/name claim.

Notes

I am not claiming this is definitely a ByteRover-only bug. It could be Gemini Flash-Lite being too eager with examples. But since the examples are inside the prompt and the output matched them, the prompt structure seems to be part of the failure mode.

Metadata

Metadata

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions