Skip to content

Is the returned ExactMatch Score object the right minimal external-consumer seam? #187

@Rul1an

Description

@Rul1an

Hi AutoEvals folks,

We just added a small Assay-side sample around ExactMatch, and I wanted to share the narrow version rather than open with a broader integration ask.

The sample is here:
https://github.com/Rul1an/assay/tree/main/examples/autoevals-exactmatch-evidence

We kept it intentionally small. The probe runs ExactMatch on autoevals==0.2.0, stores the compared output / expected values separately as discovery context, and then reduces only the returned Score object. In the Python path we observed the useful public shape as name, score, empty metadata, and error=None.

For the Assay fixture, the canonical artifact keeps only the scorer name from name, the integer 0 / 1 score from score, and a target_kind that says this is an output-vs-expected comparison level rather than a stable target id. We deliberately leave raw compared values, metadata, error state, scorer config, Braintrust run/experiment wrappers, and broader AutoEvals scorer-family semantics out of the artifact.

The question is: does the returned Score object seem like the right minimal public surface for an external evidence consumer, or would you rather point consumers at a different returned/result boundary? If there is a better Python/TypeScript-stable seam, happy to tighten the sample around that.

Thanks for maintaining AutoEvals. ExactMatch is exactly the kind of small deterministic scorer that makes this boundary easy to reason about.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions