Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
2a68ce8
feat(e2e): scaffold testing/e2e package with TanStack Start
AlemTuzlak Apr 3, 2026
cf9e5a5
chore: update pnpm-lock.yaml for @tanstack/ai-e2e
AlemTuzlak Apr 3, 2026
d9bb375
feat(e2e): add core library files - types, providers, features, tools
AlemTuzlak Apr 3, 2026
2b23f8c
feat(e2e): add root layout, landing pages, and NotSupported component
AlemTuzlak Apr 3, 2026
743e2fa
feat(e2e): add ChatUI, ToolCallDisplay, and ApprovalPrompt components
AlemTuzlak Apr 3, 2026
342b173
feat(e2e): add media components - ImageDisplay, AudioPlayer, Transcri…
AlemTuzlak Apr 3, 2026
ae68b83
feat(e2e): add API routes - chat, summarize, image, tts, transcription
AlemTuzlak Apr 3, 2026
7ac5856
feat(e2e): add dynamic feature route with all feature UI variants
AlemTuzlak Apr 3, 2026
5a9f66b
feat(e2e): add all llmock fixtures and test assets
AlemTuzlak Apr 3, 2026
7fd7661
feat(e2e): add Playwright config with llmock global setup/teardown
AlemTuzlak Apr 3, 2026
8241d33
feat(e2e): add test helpers and test matrix
AlemTuzlak Apr 3, 2026
492149d
feat(e2e): add chat, one-shot, reasoning, and multi-turn test specs
AlemTuzlak Apr 3, 2026
d5c75d9
feat(e2e): add tool-calling, parallel-tool-calls, and tool-approval t…
AlemTuzlak Apr 3, 2026
ebc0571
feat(e2e): add all remaining test specs - structured output, multimod…
AlemTuzlak Apr 3, 2026
03d1be9
feat(e2e): add llmock recording mode support
AlemTuzlak Apr 3, 2026
51f5c02
ci: add E2E testing workflow with Playwright cache and video artifacts
AlemTuzlak Apr 3, 2026
7f7bf01
docs(e2e): add README with guide for adding new test cases
AlemTuzlak Apr 3, 2026
eafd261
fix(e2e): fix API route imports - use generateImage/generateSpeech/ge…
AlemTuzlak Apr 3, 2026
7d58f19
ci: apply automated fixes
autofix-ci[bot] Apr 3, 2026
e812b45
fix(ci): use build:all instead of build in e2e workflow to avoid nx a…
AlemTuzlak Apr 3, 2026
17019d5
fix(e2e): fix __dirname in ESM multimodal specs and load fixture subd…
AlemTuzlak Apr 3, 2026
ff662ff
fix(e2e): use create* adapter factories with explicit API keys and co…
AlemTuzlak Apr 3, 2026
f43ed30
ci: apply automated fixes
autofix-ci[bot] Apr 3, 2026
6f6f3d3
fix(e2e): fix three root causes for test failures
AlemTuzlak Apr 3, 2026
f026658
ci: apply automated fixes
autofix-ci[bot] Apr 3, 2026
9529213
fix(e2e): prevent server crash on structured output errors and fix re…
AlemTuzlak Apr 3, 2026
8fe9244
ci: apply automated fixes
autofix-ci[bot] Apr 3, 2026
23157b9
fix(e2e): comprehensive test fixes — 30 passing, 40 skipped, 49 remai…
AlemTuzlak Apr 3, 2026
281ea65
ci: apply automated fixes
autofix-ci[bot] Apr 3, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
77 changes: 77 additions & 0 deletions .github/workflows/e2e.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
name: E2E Tests

on:
pull_request:
push:
branches: [main, alpha, beta, rc]

concurrency:
group: ${{ github.workflow }}-${{ github.event.number || github.ref }}
cancel-in-progress: true

env:
NX_CLOUD_ACCESS_TOKEN: ${{ secrets.NX_CLOUD_ACCESS_TOKEN }}

permissions:
contents: read

jobs:
e2e:
name: E2E Tests
runs-on: ubuntu-latest
timeout-minutes: 15
steps:
- name: Checkout
uses: actions/checkout@v6.0.2
with:
fetch-depth: 0

- name: Setup Tools
uses: TanStack/config/.github/setup@main

- name: Cache Playwright Browsers
id: playwright-cache
uses: actions/cache@v4
with:
path: ~/.cache/ms-playwright
key: playwright-${{ hashFiles('testing/e2e/package.json') }}
restore-keys: |
playwright-

- name: Build Packages
run: pnpm run build:all

- name: Install Playwright Chromium
if: steps.playwright-cache.outputs.cache-hit != 'true'
run: pnpm --filter @tanstack/ai-e2e exec playwright install --with-deps chromium

- name: Install Playwright Deps (if cached)
if: steps.playwright-cache.outputs.cache-hit == 'true'
run: pnpm --filter @tanstack/ai-e2e exec playwright install-deps chromium

- name: Run E2E Tests
run: pnpm --filter @tanstack/ai-e2e test:e2e

- name: Upload Video Recordings
uses: actions/upload-artifact@v4
if: always()
with:
name: e2e-videos
path: testing/e2e/test-results/**/*.webm
retention-days: 14

- name: Upload Playwright Report
uses: actions/upload-artifact@v4
if: always()
with:
name: e2e-report
path: testing/e2e/playwright-report/
retention-days: 14

- name: Upload Traces (on failure)
uses: actions/upload-artifact@v4
if: failure()
with:
name: e2e-traces
path: testing/e2e/test-results/**/*.zip
retention-days: 14
110 changes: 110 additions & 0 deletions pnpm-lock.yaml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions testing/e2e/.env
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
LLMOCK_URL=http://127.0.0.1:4010
155 changes: 155 additions & 0 deletions testing/e2e/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,155 @@
# TanStack AI E2E Tests

End-to-end tests for TanStack AI using Playwright and [llmock](https://github.com/CopilotKit/llmock) for deterministic LLM mocking.

**Architecture:** Playwright drives a TanStack Start app (`testing/e2e/`) which routes requests through provider adapters pointing at llmock. Fixtures define the mock responses. No real API keys needed.

**Features tested:** chat, one-shot-text, reasoning, multi-turn, tool-calling, parallel-tool-calls, tool-approval, structured-output, agentic-structured, multimodal-image, multimodal-structured, summarize, summarize-stream, image-gen, tts, transcription

**Providers tested:** openai, anthropic, gemini, ollama, groq, grok, openrouter

## 1. Quick Start

```bash
# Install dependencies
pnpm install

# Run all E2E tests
pnpm --filter @tanstack/ai-e2e test:e2e

# Run with Playwright UI (useful for debugging)
pnpm --filter @tanstack/ai-e2e test:e2e:ui

# Run a single spec
pnpm --filter @tanstack/ai-e2e test:e2e -- --grep "openai -- chat"
```

## 2. Recording a New Fixture

```bash
# 1. Set your API keys
export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-...
# (add whichever providers you need)

# 2. Start the app in record mode
pnpm --filter @tanstack/ai-e2e record

# 3. Open the browser and navigate to the feature you want to record
# e.g. http://localhost:3010/openai/tool-calling

# 4. Interact with the chat - type your message, wait for the response.
# llmock proxies to the real API and saves the response as a fixture.

# 5. Find your recorded fixture in testing/e2e/fixtures/recorded/
# Files are named: {provider}-{timestamp}-{uuid}.json

# 6. Stop the dev server (Ctrl+C)
```

## 3. Organizing the Recorded Fixture

Move from `recorded/` to the appropriate feature directory:

```bash
mv fixtures/recorded/openai-2026-04-03T*.json fixtures/tool-calling/my-new-scenario.json
```

Then edit the fixture to clean it up:

- **Simplify the `match` field** - use a short, unique `userMessage` that your test will send
- **Verify the `response`** - check that the content, toolCalls, or reasoning fields look correct
- **Remove provider-specific artifacts** - fixtures should be provider-agnostic

Example - before cleanup:

```json
{
"fixtures": [
{
"match": {
"userMessage": "Hey, I'm looking for a guitar...",
"model": "gpt-4o"
},
"response": {
"content": "I'd recommend checking out the Fender Stratocaster..."
}
}
]
}
```

After cleanup:

```json
{
"fixtures": [
{
"match": { "userMessage": "recommend a blues guitar" },
"response": {
"content": "I'd recommend checking out the Fender Stratocaster..."
}
}
]
}
```

## 4. Writing the Test

```typescript
// In tests/tool-calling.spec.ts (or whichever spec fits)

test('calls getGuitars with category filter', async ({ page }) => {
await page.goto(`/${provider}/tool-calling`)
if (await isNotSupported(page)) {
test.skip()
return
}

// Send the exact message that matches your fixture
await sendMessage(page, 'show me acoustic guitars')
await waitForResponse(page)

// Assert on what should appear in the UI
const toolCalls = await getToolCalls(page)
expect(toolCalls).toHaveLength(1)
expect(toolCalls[0].name).toBe('getGuitars')

const response = await getLastAssistantMessage(page)
expect(response).toContain('acoustic')
})
```

## 5. Adding a New Feature

1. **Add the feature to `src/lib/features.ts`** - define tools, modelOptions, outputSchema
2. **Add the feature to `src/lib/feature-support.ts`** - mark which providers support it
3. **Add the feature to `tests/test-matrix.ts`** - so tests iterate over it
4. **Create a fixture directory** - `fixtures/my-new-feature/basic.json`
5. **Create a test spec** - `tests/my-new-feature.spec.ts`
6. **Update the UI if needed** - if the feature needs new UI beyond ChatUI, add a component

## 6. Adding a New Provider

1. **Add the adapter factory to `src/lib/providers.ts`**
2. **Add the provider to `src/lib/feature-support.ts`**
3. **Add the provider to `tests/test-matrix.ts`**
4. **No fixture changes needed** - fixtures are provider-agnostic
5. **Verify llmock supports the provider**

## 7. Fixture Matching Tips

- **`userMessage`** is the primary match key - use short, unique strings
- **`sequenceIndex`** is essential for multi-turn and tool-call flows
- **`tool`** matches when the model calls a specific tool
- **`model`** matches a specific model name - avoid unless needed (breaks provider-agnosticism)
- **`predicate`** is a custom function for complex matching - last resort
- Fixtures are matched in order - first match wins

## 8. Troubleshooting

- **Test times out waiting for response**: Check that `userMessage` in the fixture exactly matches what `sendMessage()` sends
- **Wrong fixture matched**: Make `userMessage` strings more specific or use `sequenceIndex`
- **"Not supported" shows unexpectedly**: Check `src/lib/feature-support.ts`
- **Fixture works for OpenAI but not Anthropic**: Remove `model` from match field
- **Recording doesn't capture the response**: Verify API key env var is set
24 changes: 24 additions & 0 deletions testing/e2e/fixtures/agentic-structured/basic.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
{
"fixtures": [
{
"match": {
"userMessage": "[agentic] check inventory and recommend",
"sequenceIndex": 0
},
"response": {
"toolCalls": [
{
"name": "getGuitars",
"arguments": "{}"
}
]
}
},
{
"match": { "sequenceIndex": 1 },
"response": {
"content": "{\"name\":\"Fender Stratocaster\",\"price\":1299,\"reason\":\"Most affordable and versatile option in stock\",\"rating\":5}"
}
}
]
}
Loading
Loading