feat: add reflection-static plugin with simpler self-assessment approach #43

dzianisv · 2026-02-07T22:24:57Z

Summary

Adds a new reflection-static.ts plugin that uses a simpler, more reliable approach to task completion verification:

Static self-assessment question: When session idles, asks the agent "What was the task? Are you sure you completed it? If not, why did you stop?"
GenAI-powered analysis: Uses a judge session to analyze the agent's self-assessment
No infinite loops: If agent confirms completion → toast only. If improvements needed → push to continue
E2E tested: Evaluation scored 5/5 with Python hello world + unit tests task

Why This Approach?

The original reflection.ts had complex stuck detection and compression nudges that were difficult to verify working correctly. This simpler approach:

✅ Asks one clear question
✅ Lets the agent self-assess honestly
✅ Uses GenAI to verify the assessment
✅ Takes appropriate action based on response

E2E Test Results

Metric	Value
Score	5/5
Verdict	COMPLETE
Duration	106 seconds
Files created	`hello.py`, `test_hello.py`
Python tests ran	✅ Yes
Python tests passed	✅ Yes

Plugin Effectiveness

Check	Result
✅ Triggered correctly	Yes
✅ Asked self-assessment	Yes
✅ Analyzed response	Yes
✅ Took appropriate action	`complete`
✅ Helped complete task	Yes

Agent's Self-Assessment Response

What was the task?
The user requested to create a Python application with a function that returns "Hello, World!". This involved:

Writing a file (hello.py) with the function.

Writing a test file (test_hello.py) to validate the function using pytest.

Running the tests and ensuring that they pass without errors.

Are you sure you completed it?
Yes, I am confident that the task was successfully completed.

New Scripts

# Run E2E evaluation test
npm run test:reflection-static

# Deploy reflection-static (disables reflection.ts)
npm run install:reflection-static

# Deploy original reflection.ts (disables reflection-static)
npm run install:global

Checklist

TypeScript compiles without errors (npm run typecheck)
Unit tests pass (152 tests) (npm test)
Plugin load test passes (5 tests) (npm run test:load)
E2E evaluation test passes (5/5 score) (npm run test:reflection-static)

Add a new reflection-static.ts plugin that uses a simpler approach: 1. Ask the agent a static self-assessment question when session idles 2. Use GenAI judge to analyze the agent's response 3. If agent confirms completion → toast notification, no feedback loop 4. If agent identifies improvements → push to continue Features: - Simple self-assessment question: "What was the task? Are you sure you completed it?" - GenAI-powered analysis of agent's self-assessment - Prevents infinite feedback loops by tracking confirmed completions - Tracks aborted sessions to skip reflection - E2E test that verifies plugin effectiveness (scored 5/5) New npm scripts: - test:reflection-static: Run E2E evaluation test - install:reflection-static: Deploy reflection-static instead of reflection.ts

- Add multiple abort detection layers (session.error, message.aborted) - Add delay before reflection to allow abort events to arrive - Check if last message was aborted/incomplete in runReflection - Remove mock evaluation fallback - require real Azure LLM - Use AZURE_OPENAI_DEPLOYMENT env var for eval model

- Change from Set to Map with timestamps for abort tracking - Add 10 second cooldown period after Esc press - Add type cast for error property to fix TypeScript error - Separate completed check from error check for clearer debugging - Match pattern from reflection.ts for consistent behavior

dzianisv added 4 commits February 7, 2026 14:24

fix: use override:true for dotenv to ensure correct Azure credentials

9069309

dzianisv merged commit d103661 into main Feb 7, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add reflection-static plugin with simpler self-assessment approach #43

feat: add reflection-static plugin with simpler self-assessment approach #43

Uh oh!

dzianisv commented Feb 7, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

feat: add reflection-static plugin with simpler self-assessment approach #43

feat: add reflection-static plugin with simpler self-assessment approach #43

Uh oh!

Conversation

dzianisv commented Feb 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why This Approach?

E2E Test Results

Plugin Effectiveness

Agent's Self-Assessment Response

New Scripts

Checklist

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

dzianisv commented Feb 7, 2026 •

edited

Loading