feat: add StringCheckGrader support for OpenAI Evals backend by wiliyam · Pull Request #102 · agentevals-dev/agentevals

wiliyam · 2026-04-01T23:19:58Z

Summary

Closes #95

Adds support for OpenAI's string_check grader type alongside the existing text_similarity grader.

Changes

config.py: Added _VALID_STRING_CHECK_OPERATIONS set with all supported operations (eq, ne, like, ilike, contains, not_contains, starts_with, ends_with). Updated _validate_grader to validate string_check configs.
openai_eval_backend.py: Added string_check case in _build_testing_criteria that maps to the OpenAI testing criteria format.

Usage

evaluators:
  - name: response_check
    type: openai_eval
    grader:
      type: string_check
      operation: contains
      reference: "expected keyword"

…als-dev#95)

krisztianfekete

Thank you, added some review comments!

krisztianfekete · 2026-04-02T11:10:44Z

src/agentevals/openai_eval_backend.py

This will reject all grader types with this conditional, but string_check uses a static reference from config and doesn't need them.

Can you gate this on grader_type?

krisztianfekete · 2026-04-02T11:12:59Z

src/agentevals/openai_eval_backend.py

        "actual_response": {"type": "string"},
        "expected_response": {"type": "string"},
    },
    "required": ["actual_response", "expected_response"],


expected_response is no longer required as string_checker does not use it. Maybe we should make the schema grader-aware.

The JSONL items contain a field not declared in the schema. Please make this builder grader-aware too

krisztianfekete · 2026-04-02T11:18:45Z

src/agentevals/openai_eval_backend.py

This will return None for string_check graders. Please make this conditional, or include grader-relevant keys, e.g. operation instead.

krisztianfekete · 2026-04-02T11:20:03Z

src/agentevals/config.py

+                raise ValueError("'operation' is required for string_check grader")
+            if operation not in _VALID_STRING_CHECK_OPERATIONS:
+                raise ValueError(f"Unknown operation '{operation}'. Valid: {sorted(_VALID_STRING_CHECK_OPERATIONS)}")
+            if "reference" not in v:


Can we do what we do for the other branch here as well with if not metric?

Still relevant.

krisztianfekete · 2026-04-02T11:20:47Z

src/agentevals/config.py

+            if "reference" not in v:
+                raise ValueError("'reference' is required for string_check grader")
+        else:
+            supported = "'text_similarity', 'string_check'"


Can we use something like _SUPPORTED_GRADER_TYPES constant for all supported graders?

…ocations on grader type, use _SUPPORTED_GRADER_TYPES constant

wiliyam · 2026-04-02T12:35:36Z

Thanks for the detailed review @krisztianfekete! Addressed all 5 points:

Grader type check — moved the grader_type not in _SUPPORTED_GRADER_TYPES check to the top, so unsupported types are rejected immediately regardless of other conditions
Grader-aware schema — added _ACTUAL_ONLY_SCHEMA for graders that don't need expected_response (like string_check), and _get_item_schema(grader_type) helper to select the right schema
expected_invocations gating — now only required for non-string_check graders since string_check uses a static reference from config
operation in error context — the string_check testing criteria now correctly uses operation from config
_SUPPORTED_GRADER_TYPES constant — added, used in both the validator and the unsupported-type error message

krisztianfekete

Can you please take a closer look, EValRunConfig most definitely shouldn't been deleted, and many review feedback haven't been addressed. Also keep our guidelines in mind when contributing: https://github.com/agentevals-dev/agentevals/blob/main/CONTRIBUTING.md#responsible-ai-usage

krisztianfekete · 2026-04-02T13:23:18Z

src/agentevals/config.py

    BuiltinMetricDef | CodeEvaluatorDef | RemoteEvaluatorDef | OpenAIEvalDef,
    Field(discriminator="type"),
 ]
-


This has to be reverted.

…levant keys in details

wiliyam · 2026-04-02T13:45:36Z

Apologies for the sloppy rewrite @krisztianfekete — I accidentally deleted EvalRunConfig when rewriting config.py. Fixed in this push:

✅ EvalRunConfig restored — exactly as it was in upstream
✅ Validator order reverted — type-specific checks first, unsupported type raises at the bottom (original pattern)
✅ if not metric style — matches other branch
✅ Grader-relevant keys in details — operation for string_check, evaluation_metric for text_similarity instead of always returning None
✅ _SUPPORTED_GRADER_TYPES constant — kept, used in the final else raise
✅ Grader-aware schema — _ACTUAL_ONLY_SCHEMA for string_check, _TEXT_PAIR_SCHEMA for text_similarity
✅ expected_invocations gating — only required for non-string_check graders

Sorry again for the noise!

…e check

wiliyam · 2026-04-03T00:05:10Z

Addressed latest comments @krisztianfekete:

✅ JSONL builder grader-aware — _build_jsonl_items now accepts grader_type and only includes expected_response for non-string_check graders — matching the item schema exactly
✅ if not v.get("reference") — changed from if "reference" not in v to match the if not metric pattern used in the text_similarity branch

krisztianfekete · 2026-04-06T15:13:37Z

src/agentevals/config.py

    }
 )

+_VALID_STRING_CHECK_OPERATIONS = frozenset(


These are not all valid, please fix it.

feat: add StringCheckGrader support for OpenAI Evals backend (agentev…

bd6cc7b

…als-dev#95)

krisztianfekete requested changes Apr 2, 2026

View reviewed changes

fix: address review feedback - grader-aware schema, gate expected_inv…

a502315

…ocations on grader type, use _SUPPORTED_GRADER_TYPES constant

krisztianfekete requested changes Apr 2, 2026

View reviewed changes

fix: restore EvalRunConfig, revert validator order, include grader-re…

95fdbc0

…levant keys in details

fix: make JSONL builder grader-aware, use if-not pattern for referenc…

b100aea

…e check

krisztianfekete requested changes Apr 6, 2026

View reviewed changes

Conversation

wiliyam commented Apr 1, 2026

Summary

Changes

Usage

Uh oh!

krisztianfekete left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wiliyam commented Apr 2, 2026

Uh oh!

krisztianfekete left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wiliyam commented Apr 2, 2026

Uh oh!

wiliyam commented Apr 3, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants