Skip to content

[Bug]: Azure GPT-4.1 content filter false positives silently zero out AppWorld tasks #60

@Sergey-Zeltyn

Description

@Sergey-Zeltyn

Several AppWorld benchmark tasks trigger Azure OpenAI's responsible-AI content
filter mid-run (severity "sexual: medium"), causing LiteLLM to raise ContentPolicyViolationError. The agent loop aborts and the task scores 0.0 — indistinguishable in the report from a real agent failure.

The filter is stochastic: same task, same agent, same model passes some runs and fails others.

Affected tasks observed (none with sexually explicit intent — false positives):

  • 2e9b91e_1 "Request Denise for Venmo money for Amazon cart items"
  • 98d2608_1 "Email the driving license found in my file system to my partner"
  • a3ba388_1 "Schedule resignation.pdf to be sent to my manager"
  • b3bdcc1_1 "Buy an air purifier on Amazon using my Visa card"

Example:

Error: Error code: 400 - {'error': {'message': "litellm.BadRequestError: litellm.ContentPolicyViolationError: The response was filtered due to the prompt triggering Azure OpenAI's content management policy. Please modify your prompt and retry. To learn more about our content filtering policies please read our documentation: https://go.microsoft.com/fwlink/?linkid=2198766\nmodel=Azure/gpt-4.1. content_policy_fallback=None. fallbacks=None.\n\nSet 'content_policy_fallback' - https://docs.litellm.ai/docs/routing#fallbacks. Received Model Group=Azure/gpt-4.1\nAvailable Model Group Fallbacks=None", 'type': 'invalid_request_error', 'param': None, 'code': '400', 'provider_specific_fields': {'innererror': {'code': 'ResponsibleAIPolicyViolation', 'content_filter_result': {'hate': {'filtered': False, 'severity': 'safe'}, 'self_harm': {'filtered': False, 'severity': 'low'}, 'sexual': {'filtered': True, 'severity': 'medium'}, 'violence': {'filtered': False, 'severity': 'safe'}}}, 'inner_error': {'code': 'ResponsibleAIPolicyViolation', 'content_filter_result': {'hate': {'filtered': False, 'severity': 'safe'}, 'self_harm': {'filtered': False, 'severity': 'low'}, 'sexual': {'filtered': True, 'severity': 'medium'}, 'violence': {'filtered': False, 'severity': 'safe'}}}, 'azure_error': {'message': "The response was filtered due to the prompt triggering Azure OpenAI's content management policy. Please modify your prompt and retry. To learn more about our content filtering policies please read our documentation: https://go.microsoft.com/fwlink/?linkid=2198766", 'type': None, 'param': 'prompt', 'code': 'content_filter', 'status': 400, 'innererror': {'code': 'ResponsibleAIPolicyViolation', 'content_filter_result': {'hate': {'filtered': False, 'severity': 'safe'}, 'self_harm': {'filtered': False, 'severity': 'low'}, 'sexual': {'filtered': True, 'severity': 'medium'}, 'violence': {'filtered': False, 'severity': 'safe'}}}}}}}

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type
No fields configured for issues without a type.

Projects

Status
Backlog

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions