Is the returned ExactMatch Score object the right minimal external-consumer seam?

Hi AutoEvals folks,

We just added a small Assay-side sample around `ExactMatch`, and I wanted to share the narrow version rather than open with a broader integration ask.

The sample is here:
https://github.com/Rul1an/assay/tree/main/examples/autoevals-exactmatch-evidence

We kept it intentionally small. The probe runs `ExactMatch` on `autoevals==0.2.0`, stores the compared `output` / `expected` values separately as discovery context, and then reduces only the returned `Score` object. In the Python path we observed the useful public shape as `name`, `score`, empty `metadata`, and `error=None`.

For the Assay fixture, the canonical artifact keeps only the scorer name from `name`, the integer `0` / `1` score from `score`, and a `target_kind` that says this is an output-vs-expected comparison level rather than a stable target id. We deliberately leave raw compared values, metadata, error state, scorer config, Braintrust run/experiment wrappers, and broader AutoEvals scorer-family semantics out of the artifact.

The question is: does the returned `Score` object seem like the right minimal public surface for an external evidence consumer, or would you rather point consumers at a different returned/result boundary? If there is a better Python/TypeScript-stable seam, happy to tighten the sample around that.

Thanks for maintaining AutoEvals. `ExactMatch` is exactly the kind of small deterministic scorer that makes this boundary easy to reason about.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is the returned ExactMatch Score object the right minimal external-consumer seam? #187

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Is the returned ExactMatch Score object the right minimal external-consumer seam? #187

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions