Braintrust Multi-Score Online Scorer Demo

Reproduction for Pylon #11423 / Linear BRA-4013: Python online scorer returning a list of scores fails when the rule runs automatically, but works on the "Test" page and in SDK offline eval.

The Issue

A Python scorer returns a list of scores (needed when scores must be computed sequentially, where score B depends on score A):

from braintrust import Score

def handler(input, output, expected, metadata) -> list[Score]:
    score1 = Score(name="has_output", score=1.0 if output else 0.0)
    score2 = Score(name="conciseness", score=compute_from(score1))  # depends on score1
    return [score1, score2]

Context	Result
SDK offline eval (`braintrust.Eval`)	✅ works
Online scorer "Test" page (manual trigger on past logs)	✅ works
Online scorer automatic rule (runs on new logs)	❌ `cannot log ... as a score`

Why

The "Test" page calls the scorer function and renders its return value directly — no score logging step. It just displays what the function returns, so a list works fine.

The automatic online rule path calls the scorer, then passes the result to the span logging layer, which expects scores as a flat {name: float} dict. When the result is a list, the logging call fails.

This is a backend feature gap: the online rule execution runtime needs to handle list returns by converting them to a score dict before logging. The test-page path does not have this requirement.

Setup

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
export BRAINTRUST_API_KEY=your_key_here

Reproduce

# Shows SDK offline eval works with list returns
python test_offline_eval.py

# Shows the test-page path vs automatic rule path behavior
python test_online_scorer.py

Fix (BRA-4013)

The online scorer rule runtime needs to handle list-of-Score returns by converting to a score dict before calling span.log(scores=...):

# Before (broken):
span.log(scores=scorer_result)

# After (fixed):
if isinstance(scorer_result, list):
    scores = {s.name: s.score for s in scorer_result}
else:
    scores = {scorer_result.name: scorer_result.score}
span.log(scores=scores)

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
scorer.py		scorer.py
test_offline_eval.py		test_offline_eval.py
test_online_scorer.py		test_online_scorer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Braintrust Multi-Score Online Scorer Demo

The Issue

Why

Setup

Reproduce

Fix (BRA-4013)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Braintrust Multi-Score Online Scorer Demo

The Issue

Why

Setup

Reproduce

Fix (BRA-4013)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages