A GitHub Action that runs Tessl evals against tiles when SKILL.md files change in a pull request, and posts the results as a PR comment with per-scenario scoring.
Requires a TESSL_TOKEN to authenticate with the Tessl API. The GitHub-provided GITHUB_TOKEN is used for posting PR comments.
Add this workflow to your repository at .github/workflows/skill-eval.yml:
name: Tessl Skill Eval
on:
pull_request:
paths: ['**/SKILL.md', '**/evals/**']
jobs:
eval:
runs-on: ubuntu-latest
timeout-minutes: 120
permissions:
pull-requests: write
contents: read
steps:
- uses: actions/checkout@v4
- uses: tesslio/skill-eval@main
with:
tessl-token: ${{ secrets.TESSL_TOKEN }}Any PR that modifies a SKILL.md file in a tile with eval scenarios will trigger an eval run and post results as a PR comment.
| Input | Description | Default |
|---|---|---|
enabled |
Enable eval runs. Set to false to disable entirely. |
true |
skip-label |
PR label that skips eval even when enabled. Set empty to disable. | skip-eval |
path |
Root path to search for SKILL.md files | . |
comment |
Whether to post results as a PR comment | true |
eval-workspace |
Tessl workspace name. Optional when tiles set workspace in tile.json. |
'' |
eval-agent |
Agent:model pair for evals | claude:claude-sonnet-4-6 |
eval-timeout |
Max minutes to wait for each eval run to complete | 45 |
eval-fail-on-regression |
Fail the check if any scenario scores worse with context than baseline | true |
eval-generate-scenarios |
Generate fresh scenarios for tiles without evals/ |
false |
eval-scenario-count |
Number of scenarios to generate per tile | 3 |
eval-commit-scenarios |
Commit generated scenarios back to the PR branch (requires contents: write) |
false |
tessl-token |
Tessl API token. Pass via secrets. | (required) |
- Detects which
SKILL.mdfiles were changed in the PR - Installs the Tessl CLI and authenticates with your token
- Finds parent tile directories (containing
tile.json) with eval scenarios - Runs
tessl eval runfor each tile and polls for results - Posts (or updates) an eval comment on the PR with per-scenario scores
Evals run by default. Two ways to skip them:
Disable in workflow YAML (all PRs):
- uses: tesslio/skill-eval@main
with:
enabled: falseSkip per-PR with a label: add the skip-eval label to any PR. To use a custom label name:
- uses: tesslio/skill-eval@main
with:
skip-label: no-evalSet skip-label: '' to disable the label check entirely.
The action posts a single eval comment per PR. On subsequent pushes, it updates the existing comment rather than creating a new one.
Instead of relying on pre-existing scenarios in evals/, you can generate fresh scenarios from your tile before running evals:
- uses: tesslio/skill-eval@main
with:
eval-workspace: my-workspace
eval-generate-scenarios: true
eval-scenario-count: 3
tessl-token: ${{ secrets.TESSL_TOKEN }}When eval-generate-scenarios is enabled, the action will:
- Find all tile directories (not just those with existing
evals/) - Run
tessl scenario generateto create fresh scenarios for each tile - Download the generated scenarios to the tile's
evals/directory - Run evals against the newly generated scenarios
This is useful for tiles that don't have checked-in scenarios, or when you want to evaluate against fresh scenarios generated from the current tile state.
When evals are enabled, the action walks up from each changed SKILL.md file to find the parent tile directory (a directory containing tile.json). The search checks up to 5 parent directories — if your SKILL.md is nested deeper than that relative to tile.json, the tile won't be detected (a warning is logged). If that tile directory also contains an evals/ subdirectory with scenario files, the tile is included in the eval run. Tiles without an evals/ directory are skipped.
Scenario generation and eval execution each apply the eval-timeout independently. With eval-generate-scenarios enabled, the total wall time can be up to 2x the timeout value — for example, with the default 45 minutes, generation could take up to 45 minutes and eval execution another 45 minutes, for a possible total of ~90 minutes per tile.
Scenario generation polls every 15 seconds; eval execution polls every 30 seconds. Plan your GitHub Actions job timeout accordingly:
jobs:
review:
runs-on: ubuntu-latest
timeout-minutes: 120 # allow headroom for generation + evalFor tiles with pre-existing scenarios (no generation), the total time is just the eval timeout.
Evals require a Tessl API key. To add it as a GitHub repository secret:
- Go to your repository on GitHub
- Navigate to Settings > Secrets and variables > Actions
- Click New repository secret
- Set the name to
TESSL_TOKENand paste your API key as the value - Click Add secret
Then reference it in your workflow as ${{ secrets.TESSL_TOKEN }}.
bun install
bun run lintMIT