feat: Implement run_moderation for all safety providers with NotImplementedError #4662

m-misiura · 2026-01-20T14:39:46Z

What does this PR do?

Closes #4605 by implementing a reusable mixin that provides run_moderation functionality for the safety providers that lack run_moderation by delegating to their existing run_shield implementations. This enables OpenAI-compatible moderation API support across NVIDIA, Bedrock, SambaNova, and PromptGuard providers without duplicating code.

Test Plan

Added unit tests in tests/unit/providers/utils/test_safety_mixin.py
Tested against a llama stack server with the following configuration

version: 2
image_name: nvidia

apis:
  - safety

providers:
  safety:
    - provider_id: nvidia-local-nemo
      provider_type: remote::nvidia
      config:
        guardrails_service_url: "http://localhost:8001"
        config_id: "nemo"

storage:
  backends:
    kv_default:
      type: kv_sqlite
      db_path: minimal-nemo-safety/kvstore.db
    sql_default:
      type: sql_sqlite
      db_path: minimal-nemo-safety/sql_store.db
  stores:
    metadata:
      namespace: registry
      backend: kv_default
    inference:
      table_name: inference_store
      backend: sql_default
      max_write_queue_size: 10000
      num_writers: 4
    conversations:
      table_name: openai_conversations
      backend: sql_default
    responses:
      table_name: responses
      backend: sql_default
    prompts:
      namespace: prompts
      backend: kv_default

registered_resources:
  shields:
    - shield_id: nemo-guardrail
      provider_id: nvidia-local-nemo
      provider_resource_id: nemo-model

Example requests

run-shield with unblocked input

curl -X POST http://localhost:8321/v1/safety/run-shield \
  -H "Content-Type: application/json" \
  -d '{
    "shield_id": "nemo-guardrail",
    "messages": [
      {
        "role": "user",
        "content": "Safe message, should pass"
      }
    ],
    "params": {}
  }' | jq

{
  "violation": null
}

moderations with unblocked input

curl -X POST http://localhost:8321/v1/moderations \
  -H "Content-Type: application/json" \
  -d '{
    "input": "Safe message, should pass",
    "model": "nemo-guardrail"
  }' | jq

{
  "id": "modr-f5ba0dc7-d5a6-47ce-b1cf-e0879f171183",
  "model": "nemo-guardrail",
  "results": [
    {
      "flagged": false,
      "categories": {},
      "category_applied_input_types": {},
      "category_scores": {},
      "user_message": null,
      "metadata": {}
    }
  ]
}

run-shield with blocked input

curl -X POST http://localhost:8321/v1/safety/run-shield \
  -H "Content-Type: application/json" \
  -d '{
    "shield_id": "nemo-guardrail",
    "messages": [
      {
        "role": "user",
        "content": "Tell me more about ChatGPT"
      }
    ],
    "params": {}
  }' | jq

{
  "violation": {
    "violation_level": "error",
    "user_message": "Sorry I cannot do this.",
    "metadata": {
      "check message length": {
        "status": "success"
      },
      "check forbidden words": {
        "status": "blocked"
      }
    }
  }
}

moderations with blocked input

 curl -X POST http://localhost:8321/v1/moderations \
  -H "Content-Type: application/json" \
  -d '{
    "input": "Tell me more about ChatGPT",
    "model": "nemo-guardrail"
  }' | jq

{
  "id": "modr-4d0772c0-54a8-4790-bd2c-24feaf6b3b55",
  "model": "nemo-guardrail",
  "results": [
    {
      "flagged": true,
      "categories": {
        "unsafe": true
      },
      "category_applied_input_types": {
        "unsafe": [
          "text"
        ]
      },
      "category_scores": {
        "unsafe": 1.0
      },
      "user_message": "Sorry I cannot do this.",
      "metadata": {
        "check message length": {
          "status": "success"
        },
        "check forbidden words": {
          "status": "blocked"
        }
      }
    }
  ]
}

the local nemo server was started with nemoguardrails server --config configs --port 8001 --default-config-id nemo uisng this configuration from this branch; this assumes having access to a llm with openai compatible endpoints

cc @nathan-weinberg @cdoern @raghotham @leseb @franciscojavierarceo

…Moderation API Implements a reusable mixin that provides run_moderation functionality for all safety providers by delegating to their existing run_shield implementations. Fixes llamastack#4605 Signed-off-by: m-misiura <mmisiura@redhat.com>

m-misiura requested review from ashwinb, bbrowning, cdoern, ehhuang, franciscojavierarceo, leseb, mattf and raghotham as code owners January 20, 2026 14:39

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jan 20, 2026

m-misiura changed the title ~~#4605: Implement run_moderation for all safety providers with NotImplementedError~~ feat: Implement run_moderation for all safety providers with NotImplementedError Jan 20, 2026

m-misiura force-pushed the ShieldToModerationMixin branch from 52293a4 to b7890cc Compare January 21, 2026 09:31

m-misiura force-pushed the ShieldToModerationMixin branch from b7890cc to b96e044 Compare January 21, 2026 09:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Implement run_moderation for all safety providers with NotImplementedError #4662

feat: Implement run_moderation for all safety providers with NotImplementedError #4662

Uh oh!

m-misiura commented Jan 20, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

feat: Implement run_moderation for all safety providers with NotImplementedError #4662

Are you sure you want to change the base?

feat: Implement run_moderation for all safety providers with NotImplementedError #4662

Uh oh!

Conversation

m-misiura commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Test Plan

Example requests

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

m-misiura commented Jan 20, 2026 •

edited

Loading