Skip to content

Conversation

@dzianisv
Copy link
Owner

@dzianisv dzianisv commented Feb 7, 2026

Summary

Adds a new reflection-static.ts plugin that uses a simpler, more reliable approach to task completion verification:

  • Static self-assessment question: When session idles, asks the agent "What was the task? Are you sure you completed it? If not, why did you stop?"
  • GenAI-powered analysis: Uses a judge session to analyze the agent's self-assessment
  • No infinite loops: If agent confirms completion → toast only. If improvements needed → push to continue
  • E2E tested: Evaluation scored 5/5 with Python hello world + unit tests task

Why This Approach?

The original reflection.ts had complex stuck detection and compression nudges that were difficult to verify working correctly. This simpler approach:

  1. ✅ Asks one clear question
  2. ✅ Lets the agent self-assess honestly
  3. ✅ Uses GenAI to verify the assessment
  4. ✅ Takes appropriate action based on response

E2E Test Results

Metric Value
Score 5/5
Verdict COMPLETE
Duration 106 seconds
Files created hello.py, test_hello.py
Python tests ran ✅ Yes
Python tests passed ✅ Yes

Plugin Effectiveness

Check Result
✅ Triggered correctly Yes
✅ Asked self-assessment Yes
✅ Analyzed response Yes
✅ Took appropriate action complete
✅ Helped complete task Yes

Agent's Self-Assessment Response

  1. What was the task?
    The user requested to create a Python application with a function that returns "Hello, World!". This involved:

    • Writing a file (hello.py) with the function.
    • Writing a test file (test_hello.py) to validate the function using pytest.
    • Running the tests and ensuring that they pass without errors.
  2. Are you sure you completed it?
    Yes, I am confident that the task was successfully completed.

New Scripts

# Run E2E evaluation test
npm run test:reflection-static

# Deploy reflection-static (disables reflection.ts)
npm run install:reflection-static

# Deploy original reflection.ts (disables reflection-static)
npm run install:global

Checklist

  • TypeScript compiles without errors (npm run typecheck)
  • Unit tests pass (152 tests) (npm test)
  • Plugin load test passes (5 tests) (npm run test:load)
  • E2E evaluation test passes (5/5 score) (npm run test:reflection-static)

Add a new reflection-static.ts plugin that uses a simpler approach:
1. Ask the agent a static self-assessment question when session idles
2. Use GenAI judge to analyze the agent's response
3. If agent confirms completion → toast notification, no feedback loop
4. If agent identifies improvements → push to continue

Features:
- Simple self-assessment question: "What was the task? Are you sure you completed it?"
- GenAI-powered analysis of agent's self-assessment
- Prevents infinite feedback loops by tracking confirmed completions
- Tracks aborted sessions to skip reflection
- E2E test that verifies plugin effectiveness (scored 5/5)

New npm scripts:
- test:reflection-static: Run E2E evaluation test
- install:reflection-static: Deploy reflection-static instead of reflection.ts
- Add multiple abort detection layers (session.error, message.aborted)
- Add delay before reflection to allow abort events to arrive
- Check if last message was aborted/incomplete in runReflection
- Remove mock evaluation fallback - require real Azure LLM
- Use AZURE_OPENAI_DEPLOYMENT env var for eval model
- Change from Set to Map with timestamps for abort tracking
- Add 10 second cooldown period after Esc press
- Add type cast for error property to fix TypeScript error
- Separate completed check from error check for clearer debugging
- Match pattern from reflection.ts for consistent behavior
@dzianisv dzianisv merged commit d103661 into main Feb 7, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant