Skip to content

Fail closed when Qwen3Guard output is unparseable#64

Open
fallintoplace wants to merge 1 commit into
NVIDIA:mainfrom
fallintoplace:fix/qwen3guard-fail-closed
Open

Fail closed when Qwen3Guard output is unparseable#64
fallintoplace wants to merge 1 commit into
NVIDIA:mainfrom
fallintoplace:fix/qwen3guard-fail-closed

Conversation

@fallintoplace

Copy link
Copy Markdown

Summary

  • treat missing Safety: labels as blocked instead of letting label.lower() crash
  • fail closed in is_safe() when Qwen3Guard cannot complete the safety check
  • add regression tests for malformed moderation output and runtime exceptions without loading the full model stack

Why

Qwen3Guard currently fails open when the moderation output drifts away from the expected Safety: ... format. A missing label triggers an exception, is_safe() catches it, and returns True, which lets the prompt through even though the guardrail did not finish successfully.

Validation

  • PYTHONPATH=/Users/hoangvu/Code/OSS/cosmos-framework uvx pytest --noconftest /Users/hoangvu/Code/OSS/cosmos-framework/cosmos_framework/auxiliary/guardrail/qwen3guard/qwen3guard_test.py -o addopts=
  • uvx ruff check /Users/hoangvu/Code/OSS/cosmos-framework/cosmos_framework/auxiliary/guardrail/qwen3guard/qwen3guard.py /Users/hoangvu/Code/OSS/cosmos-framework/cosmos_framework/auxiliary/guardrail/qwen3guard/qwen3guard_test.py
  • uvx ruff format --check /Users/hoangvu/Code/OSS/cosmos-framework/cosmos_framework/auxiliary/guardrail/qwen3guard/qwen3guard.py /Users/hoangvu/Code/OSS/cosmos-framework/cosmos_framework/auxiliary/guardrail/qwen3guard/qwen3guard_test.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant