Tessl Skill Eval Action

A GitHub Action that runs Tessl evals against tiles when SKILL.md files change in a pull request, and posts the results as a PR comment with per-scenario scoring.

Requires a TESSL_TOKEN to authenticate with the Tessl API. The GitHub-provided GITHUB_TOKEN is used for posting PR comments.

Usage

Add this workflow to your repository at .github/workflows/skill-eval.yml:

name: Tessl Skill Eval
on:
  pull_request:
    paths: ['**/SKILL.md', '**/evals/**']

jobs:
  eval:
    runs-on: ubuntu-latest
    timeout-minutes: 120
    permissions:
      pull-requests: write
      contents: read
    steps:
      - uses: actions/checkout@v4
      - uses: tesslio/skill-eval@main
        with:
          tessl-token: ${{ secrets.TESSL_TOKEN }}

Any PR that modifies a SKILL.md file in a tile with eval scenarios will trigger an eval run and post results as a PR comment.

Inputs

Input	Description	Default
`enabled`	Enable eval runs. Set to `false` to disable entirely.	`true`
`skip-label`	PR label that skips eval even when enabled. Set empty to disable.	`skip-eval`
`path`	Root path to search for SKILL.md files	`.`
`comment`	Whether to post results as a PR comment	`true`
`eval-workspace`	Tessl workspace name. Optional when tiles set workspace in `tile.json`.	`''`
`eval-agent`	Agent:model pair for evals	`claude:claude-sonnet-4-6`
`eval-timeout`	Max minutes to wait for each eval run to complete	`45`
`eval-fail-on-regression`	Fail the check if any scenario scores worse with context than baseline	`true`
`eval-generate-scenarios`	Generate fresh scenarios for tiles without `evals/`	`false`
`eval-scenario-count`	Number of scenarios to generate per tile	`3`
`eval-commit-scenarios`	Commit generated scenarios back to the PR branch (requires `contents: write`)	`false`
`tessl-token`	Tessl API token. Pass via secrets.	(required)

How it works

Detects which SKILL.md files were changed in the PR
Installs the Tessl CLI and authenticates with your token
Finds parent tile directories (containing tile.json) with eval scenarios
Runs tessl eval run for each tile and polls for results
Posts (or updates) an eval comment on the PR with per-scenario scores

Skipping evals

Evals run by default. Two ways to skip them:

Disable in workflow YAML (all PRs):

- uses: tesslio/skill-eval@main
  with:
    enabled: false

Skip per-PR with a label: add the skip-eval label to any PR. To use a custom label name:

- uses: tesslio/skill-eval@main
  with:
    skip-label: no-eval

Set skip-label: '' to disable the label check entirely.

Comment behavior

The action posts a single eval comment per PR. On subsequent pushes, it updates the existing comment rather than creating a new one.

Generating scenarios on-the-fly

Instead of relying on pre-existing scenarios in evals/, you can generate fresh scenarios from your tile before running evals:

- uses: tesslio/skill-eval@main
  with:
    eval-workspace: my-workspace
    eval-generate-scenarios: true
    eval-scenario-count: 3
    tessl-token: ${{ secrets.TESSL_TOKEN }}

When eval-generate-scenarios is enabled, the action will:

Find all tile directories (not just those with existing evals/)
Run tessl scenario generate to create fresh scenarios for each tile
Download the generated scenarios to the tile's evals/ directory
Run evals against the newly generated scenarios

This is useful for tiles that don't have checked-in scenarios, or when you want to evaluate against fresh scenarios generated from the current tile state.

How eval detection works

When evals are enabled, the action walks up from each changed SKILL.md file to find the parent tile directory (a directory containing tile.json). The search checks up to 5 parent directories — if your SKILL.md is nested deeper than that relative to tile.json, the tile won't be detected (a warning is logged). If that tile directory also contains an evals/ subdirectory with scenario files, the tile is included in the eval run. Tiles without an evals/ directory are skipped.

Timeouts and long-running jobs

Scenario generation and eval execution each apply the eval-timeout independently. With eval-generate-scenarios enabled, the total wall time can be up to 2x the timeout value — for example, with the default 45 minutes, generation could take up to 45 minutes and eval execution another 45 minutes, for a possible total of ~90 minutes per tile.

Scenario generation polls every 15 seconds; eval execution polls every 30 seconds. Plan your GitHub Actions job timeout accordingly:

jobs:
  review:
    runs-on: ubuntu-latest
    timeout-minutes: 120  # allow headroom for generation + eval

For tiles with pre-existing scenarios (no generation), the total time is just the eval timeout.

Setting up the TESSL_TOKEN secret

Evals require a Tessl API key. To add it as a GitHub repository secret:

Go to your repository on GitHub
Navigate to Settings > Secrets and variables > Actions
Click New repository secret
Set the name to TESSL_TOKEN and paste your API key as the value
Click Add secret

Then reference it in your workflow as ${{ secrets.TESSL_TOKEN }}.

Local development

bun install
bun run lint

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
src		src
.gitignore		.gitignore
README.md		README.md
action.yml		action.yml
bun.lock		bun.lock
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tessl Skill Eval Action

Usage

Inputs

How it works

Skipping evals

Comment behavior

Generating scenarios on-the-fly

How eval detection works

Timeouts and long-running jobs

Setting up the TESSL_TOKEN secret

Local development

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Tessl Skill Eval Action

Usage

Inputs

How it works

Skipping evals

Comment behavior

Generating scenarios on-the-fly

How eval detection works

Timeouts and long-running jobs

Setting up the TESSL_TOKEN secret

Local development

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages