Description
Some scenarios require comparing two items where the agent might not explicitly provide the expected key. We need a clear rule for when to score an answer as an abstention versus when it should be treated as a clarification or partial response.
Why
Current evaluation can be ambiguous when the answer is based on comparison but does not use the exact expected key format.
Goal
Define a consistent policy for comparison-only scenarios so scoring is stable and predictable.
Description
Some scenarios require comparing two items where the agent might not explicitly provide the expected key. We need a clear rule for when to score an answer as an abstention versus when it should be treated as a clarification or partial response.
Why
Current evaluation can be ambiguous when the answer is based on comparison but does not use the exact expected key format.
Goal
Define a consistent policy for comparison-only scenarios so scoring is stable and predictable.