feat: add StringCheckGrader support for OpenAI Evals backend#102
feat: add StringCheckGrader support for OpenAI Evals backend#102wiliyam wants to merge 4 commits intoagentevals-dev:mainfrom
Conversation
krisztianfekete
left a comment
There was a problem hiding this comment.
Thank you, added some review comments!
There was a problem hiding this comment.
This will reject all grader types with this conditional, but string_check uses a static reference from config and doesn't need them.
Can you gate this on grader_type?
| "actual_response": {"type": "string"}, | ||
| "expected_response": {"type": "string"}, | ||
| }, | ||
| "required": ["actual_response", "expected_response"], |
There was a problem hiding this comment.
expected_response is no longer required as string_checker does not use it. Maybe we should make the schema grader-aware.
There was a problem hiding this comment.
The JSONL items contain a field not declared in the schema. Please make this builder grader-aware too
There was a problem hiding this comment.
This will return None for string_check graders. Please make this conditional, or include grader-relevant keys, e.g. operation instead.
src/agentevals/config.py
Outdated
| raise ValueError("'operation' is required for string_check grader") | ||
| if operation not in _VALID_STRING_CHECK_OPERATIONS: | ||
| raise ValueError(f"Unknown operation '{operation}'. Valid: {sorted(_VALID_STRING_CHECK_OPERATIONS)}") | ||
| if "reference" not in v: |
There was a problem hiding this comment.
Can we do what we do for the other branch here as well with if not metric?
src/agentevals/config.py
Outdated
| if "reference" not in v: | ||
| raise ValueError("'reference' is required for string_check grader") | ||
| else: | ||
| supported = "'text_similarity', 'string_check'" |
There was a problem hiding this comment.
Can we use something like _SUPPORTED_GRADER_TYPES constant for all supported graders?
…ocations on grader type, use _SUPPORTED_GRADER_TYPES constant
|
Thanks for the detailed review @krisztianfekete! Addressed all 5 points:
|
krisztianfekete
left a comment
There was a problem hiding this comment.
Can you please take a closer look, EValRunConfig most definitely shouldn't been deleted, and many review feedback haven't been addressed. Also keep our guidelines in mind when contributing: https://github.com/agentevals-dev/agentevals/blob/main/CONTRIBUTING.md#responsible-ai-usage
| BuiltinMetricDef | CodeEvaluatorDef | RemoteEvaluatorDef | OpenAIEvalDef, | ||
| Field(discriminator="type"), | ||
| ] | ||
|
|
There was a problem hiding this comment.
This has to be reverted.
…levant keys in details
|
Apologies for the sloppy rewrite @krisztianfekete — I accidentally deleted
Sorry again for the noise! |
|
Addressed latest comments @krisztianfekete:
|
| } | ||
| ) | ||
|
|
||
| _VALID_STRING_CHECK_OPERATIONS = frozenset( |
There was a problem hiding this comment.
These are not all valid, please fix it.
Summary
Closes #95
Adds support for OpenAI's
string_checkgrader type alongside the existingtext_similaritygrader.Changes
config.py: Added_VALID_STRING_CHECK_OPERATIONSset with all supported operations (eq,ne,like,ilike,contains,not_contains,starts_with,ends_with). Updated_validate_graderto validatestring_checkconfigs.openai_eval_backend.py: Addedstring_checkcase in_build_testing_criteriathat maps to the OpenAI testing criteria format.Usage