Add agent docs eval: test that AI can build transfer scripts#73
Open
Add agent docs eval: test that AI can build transfer scripts#73
Conversation
- Uses Amp SDK to prompt an agent to build a TypeScript CLI - Agent must use tempo.ts SDK to transfer pathUSD on testnet - Verifies the output tx hash exists on-chain Note: Workflow changes need to be added separately (see PR description) Amp-Thread-ID: https://ampcode.com/threads/T-019c2e9b-8e68-703a-841f-92dc4d4910ef Co-authored-by: Amp <amp@ampcode.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
- Import tempoModerato from 'viem/chains' (not tempo.ts/chains) - Add testIgnore to playwright config to skip agent-*.test.ts unless AGENT_EVAL env is set - Regular E2E tests now run without the agent eval interfering Amp-Thread-ID: https://ampcode.com/threads/T-019c2e9b-8e68-703a-841f-92dc4d4910ef Co-authored-by: Amp <amp@ampcode.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds an end-to-end eval that uses the Amp SDK to test whether AI agents can successfully build working code using Tempo docs.
What it does
Why
Per discussion in #product-docs - we're seeing agents (like Opus 4.5) get confused about:
This eval will help us iterate on docs until agents succeed consistently.
Files changed
e2e/agent-transfer-funds.test.ts- The eval testpackage.json- Added @sourcegraph/amp-sdk dependencyManual step needed
After merging, add this to
.github/workflows/verify.ymlto run the eval on schedule:Also add
AMP_API_KEYto repository secrets.