Skip to content

Conversation

@m-misiura
Copy link

@m-misiura m-misiura commented Jan 20, 2026

What does this PR do?

Closes #4605 by implementing a reusable mixin that provides run_moderation functionality for the safety providers that lack run_moderation by delegating to their existing run_shield implementations. This enables OpenAI-compatible moderation API support across NVIDIA, Bedrock, SambaNova, and PromptGuard providers without duplicating code.

Test Plan

  1. Added unit tests in tests/unit/providers/utils/test_safety_mixin.py

  2. Tested against a llama stack server with the following configuration

version: 2
image_name: nvidia

apis:
  - safety

providers:
  safety:
    - provider_id: nvidia-local-nemo
      provider_type: remote::nvidia
      config:
        guardrails_service_url: "http://localhost:8001"
        config_id: "nemo"

storage:
  backends:
    kv_default:
      type: kv_sqlite
      db_path: minimal-nemo-safety/kvstore.db
    sql_default:
      type: sql_sqlite
      db_path: minimal-nemo-safety/sql_store.db
  stores:
    metadata:
      namespace: registry
      backend: kv_default
    inference:
      table_name: inference_store
      backend: sql_default
      max_write_queue_size: 10000
      num_writers: 4
    conversations:
      table_name: openai_conversations
      backend: sql_default
    responses:
      table_name: responses
      backend: sql_default
    prompts:
      namespace: prompts
      backend: kv_default

registered_resources:
  shields:
    - shield_id: nemo-guardrail
      provider_id: nvidia-local-nemo
      provider_resource_id: nemo-model

Example requests

  • run-shield with unblocked input
curl -X POST http://localhost:8321/v1/safety/run-shield \
  -H "Content-Type: application/json" \
  -d '{
    "shield_id": "nemo-guardrail",
    "messages": [
      {
        "role": "user",
        "content": "Safe message, should pass"
      }
    ],
    "params": {}
  }' | jq
{
  "violation": null
}
  • moderations with unblocked input
curl -X POST http://localhost:8321/v1/moderations \
  -H "Content-Type: application/json" \
  -d '{
    "input": "Safe message, should pass",
    "model": "nemo-guardrail"
  }' | jq
{
  "id": "modr-f5ba0dc7-d5a6-47ce-b1cf-e0879f171183",
  "model": "nemo-guardrail",
  "results": [
    {
      "flagged": false,
      "categories": {},
      "category_applied_input_types": {},
      "category_scores": {},
      "user_message": null,
      "metadata": {}
    }
  ]
}
  • run-shield with blocked input
curl -X POST http://localhost:8321/v1/safety/run-shield \
  -H "Content-Type: application/json" \
  -d '{
    "shield_id": "nemo-guardrail",
    "messages": [
      {
        "role": "user",
        "content": "Tell me more about ChatGPT"
      }
    ],
    "params": {}
  }' | jq
{
  "violation": {
    "violation_level": "error",
    "user_message": "Sorry I cannot do this.",
    "metadata": {
      "check message length": {
        "status": "success"
      },
      "check forbidden words": {
        "status": "blocked"
      }
    }
  }
}
  • moderations with blocked input
 curl -X POST http://localhost:8321/v1/moderations \
  -H "Content-Type: application/json" \
  -d '{
    "input": "Tell me more about ChatGPT",
    "model": "nemo-guardrail"
  }' | jq
{
  "id": "modr-4d0772c0-54a8-4790-bd2c-24feaf6b3b55",
  "model": "nemo-guardrail",
  "results": [
    {
      "flagged": true,
      "categories": {
        "unsafe": true
      },
      "category_applied_input_types": {
        "unsafe": [
          "text"
        ]
      },
      "category_scores": {
        "unsafe": 1.0
      },
      "user_message": "Sorry I cannot do this.",
      "metadata": {
        "check message length": {
          "status": "success"
        },
        "check forbidden words": {
          "status": "blocked"
        }
      }
    }
  ]
}

the local nemo server was started with nemoguardrails server --config configs --port 8001 --default-config-id nemo uisng this configuration from this branch; this assumes having access to a llm with openai compatible endpoints

cc @nathan-weinberg @cdoern @raghotham @leseb @franciscojavierarceo

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jan 20, 2026
@m-misiura m-misiura changed the title #4605: Implement run_moderation for all safety providers with NotImplementedError feat: Implement run_moderation for all safety providers with NotImplementedError Jan 20, 2026
@m-misiura m-misiura force-pushed the ShieldToModerationMixin branch from 52293a4 to b7890cc Compare January 21, 2026 09:31
…Moderation API

Implements a reusable mixin that provides run_moderation functionality for all
safety providers by delegating to their existing run_shield implementations.

Fixes llamastack#4605

Signed-off-by: m-misiura <mmisiura@redhat.com>
@m-misiura m-misiura force-pushed the ShieldToModerationMixin branch from b7890cc to b96e044 Compare January 21, 2026 09:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement run_moderation for all safety providers with NotImplementedError

1 participant