Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
208 changes: 208 additions & 0 deletions examples/front_ends/responses_api_endpoint/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,208 @@
<!--
SPDX-FileCopyrightText: Copyright (c) 2024-2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
SPDX-License-Identifier: Apache-2.0

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->

# OpenAI Responses API Endpoint

**Complexity:** 🟢 Beginner

This example demonstrates how to configure a NeMo Agent toolkit FastAPI frontend to accept requests in the [OpenAI Responses API format](https://platform.openai.com/docs/api-reference/responses).
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Spell out "NeMo Agent Toolkit" with correct capitalization.

Per coding guidelines, documentation in Markdown files should not use NAT as an acronym and should always spell out "NeMo Agent Toolkit." Also, "toolkit" should be capitalized to "Toolkit" to match the official name. This also applies to Line 37.

Suggested fix
-This example demonstrates how to configure a NeMo Agent toolkit FastAPI frontend to accept requests in the [OpenAI Responses API format](https://platform.openai.com/docs/api-reference/responses).
+This example demonstrates how to configure a NeMo Agent Toolkit FastAPI frontend to accept requests in the [OpenAI Responses API format](https://platform.openai.com/docs/api-reference/responses).

As per coding guidelines: "Documentation in Markdown files should not use NAT as an acronym, always spell out NeMo Agent Toolkit."

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
This example demonstrates how to configure a NeMo Agent toolkit FastAPI frontend to accept requests in the [OpenAI Responses API format](https://platform.openai.com/docs/api-reference/responses).
This example demonstrates how to configure a NeMo Agent Toolkit FastAPI frontend to accept requests in the [OpenAI Responses API format](https://platform.openai.com/docs/api-reference/responses).
🤖 Prompt for AI Agents
In `@examples/front_ends/responses_api_endpoint/README.md` at line 22, Replace any
occurrences of the lowercase/abbreviated name with the correct official name:
change "NeMo Agent toolkit" and any "NAT" usages in the README to "NeMo Agent
Toolkit" (also update the occurrence referenced at the second mention currently
on the file). Ensure capitalization exactly matches "NeMo Agent Toolkit"
everywhere in the document.


## Overview

The OpenAI Responses API uses a different request format than the Chat Completions API:

| Feature | Chat Completions API | Responses API |
|---------|---------------------|---------------|
| Input field | `messages` (array) | `input` (string or array) |
| System prompt | In messages array | `instructions` field |
| Response object | `chat.completion` | `response` |
| Streaming events | `chat.completion.chunk` | `response.created`, `response.output_text.delta`, etc. |

This example configures the `/v1/responses` endpoint to accept the Responses API format while the standard `/generate` and `/chat` endpoints continue using Chat Completions format.

> **⚠️ Important**: The Responses API format is provided for pass-through compatibility with managed services that support stateful backends (such as OpenAI and Azure OpenAI). NeMo Agent toolkit workflows do not inherently support stateful backends. Features like `previous_response_id` will be accepted but ignored.

## Prerequisites

1. **Install LangChain integration** (required for `tool_calling_agent` workflow):

```bash
uv pip install -e '.[langchain]'
```

2. **Set up the NVIDIA API key**:

```bash
export NVIDIA_API_KEY=<YOUR_API_KEY>
```

## Start the Server

```bash
nat serve --config_file examples/front_ends/responses_api_endpoint/configs/config.yml
```

The server will start on port 8088 with the following endpoints:

| Endpoint | Format | Description |
|----------|--------|-------------|
| `/generate` | NAT default | Standard workflow endpoint |
| `/chat` | Chat Completions | OpenAI Chat Completions format |
| `/chat/stream` | Chat Completions | Streaming Chat Completions |
| `/v1/responses` | Responses API | OpenAI Responses API format |
Comment on lines +59 to +66
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Replace "NAT" acronym with "NeMo Agent Toolkit" in the endpoint table.

Line 63 uses "NAT default" as the format description. Per coding guidelines, spell out "NeMo Agent Toolkit" in Markdown documentation.

Suggested fix
-| `/generate` | NAT default | Standard workflow endpoint |
+| `/generate` | NeMo Agent Toolkit default | Standard workflow endpoint |

As per coding guidelines: "Documentation in Markdown files should not use NAT as an acronym, always spell out NeMo Agent Toolkit."

🤖 Prompt for AI Agents
In `@examples/front_ends/responses_api_endpoint/README.md` around lines 59 - 66,
Update the endpoint table entry for the `/generate` row to replace the acronym
"NAT default" with the full phrase "NeMo Agent Toolkit default" (or "NeMo Agent
Toolkit") so the Format column spells out NeMo Agent Toolkit instead of using
NAT; locate the `/generate` table row in README.md and edit its Format cell
accordingly.


## Test with curl

### Responses API Format (Non-Streaming)

```bash
curl -X POST http://localhost:8088/v1/responses \
-H 'Content-Type: application/json' \
-d '{
"model": "gpt-4o-mini",
"input": "What time is it?"
}'
```

**Expected Response:**

```json
{
"id": "resp_abc123...",
"object": "response",
"status": "completed",
"model": "gpt-4o-mini",
"output": [
{
"type": "message",
"role": "assistant",
"content": [
{
"type": "output_text",
"text": "The current time is..."
}
]
}
]
}
```

### Responses API Format (Streaming)

```bash
curl -X POST http://localhost:8088/v1/responses \
-H 'Content-Type: application/json' \
-d '{
"model": "gpt-4o-mini",
"input": "What time is it?",
"stream": true
}'
```

**Expected SSE Events:**

```
event: response.created
data: {"type": "response.created", "response": {"id": "resp_...", "status": "in_progress"}}

event: response.output_item.added
data: {"type": "response.output_item.added", ...}

event: response.output_text.delta
data: {"type": "response.output_text.delta", "delta": "The current"}

event: response.output_text.delta
data: {"type": "response.output_text.delta", "delta": " time is..."}

event: response.done
data: {"type": "response.done", "response": {"status": "completed", ...}}
```
Comment on lines +118 to +133
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add a language specifier to the fenced code block.

The SSE events example block is missing a language identifier. Since this is a text-based event stream format, use text as the language.

Suggested fix
-```
+```text
 event: response.created
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
```
event: response.created
data: {"type": "response.created", "response": {"id": "resp_...", "status": "in_progress"}}
event: response.output_item.added
data: {"type": "response.output_item.added", ...}
event: response.output_text.delta
data: {"type": "response.output_text.delta", "delta": "The current"}
event: response.output_text.delta
data: {"type": "response.output_text.delta", "delta": " time is..."}
event: response.done
data: {"type": "response.done", "response": {"status": "completed", ...}}
```
🧰 Tools
🪛 markdownlint-cli2 (0.20.0)

[warning] 118-118: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
In `@examples/front_ends/responses_api_endpoint/README.md` around lines 118 - 133,
Update the fenced code block that shows the SSE events by adding the language
specifier "text" after the opening backticks (i.e., change ``` to ```text) so
the block is explicitly marked as text; locate the SSE example block containing
lines like "event: response.created" and "data: {\"type\": \"response.created\",
...}" and apply the change to the opening fence.


### With System Instructions

```bash
curl -X POST http://localhost:8088/v1/responses \
-H 'Content-Type: application/json' \
-d '{
"model": "gpt-4o-mini",
"input": "What time is it?",
"instructions": "You are a helpful assistant. Always be concise."
}'
```

### With Tools

```bash
curl -X POST http://localhost:8088/v1/responses \
-H 'Content-Type: application/json' \
-d '{
"model": "gpt-4o-mini",
"input": "What time is it?",
"tools": [
{
"type": "function",
"name": "current_datetime",
"description": "Get the current date and time"
}
]
}'
```

### Chat Completions Format (Still Works)

The `/chat` endpoint continues to use the Chat Completions format:

```bash
curl -X POST http://localhost:8088/chat \
-H 'Content-Type: application/json' \
-d '{
"messages": [{"role": "user", "content": "What time is it?"}]
}'
```

## Configuration Options

### Using Explicit Format Override

If you want to use the Responses API format on a custom path (not containing "responses"), use the explicit `openai_api_v1_format` setting:

```yaml
general:
front_end:
_type: fastapi
workflow:
openai_api_v1_path: /v1/custom/endpoint
openai_api_v1_format: responses # Force Responses API format
```

Available format options:
- `auto` (default): Detects based on path pattern
- `chat_completions`: Force Chat Completions API format
- `responses`: Force Responses API format

## Limitations

- **No Stateful Backend**: `previous_response_id` is accepted but ignored
- **No Built-in Tools**: OpenAI built-in tools like `code_interpreter` are not executed by NAT; use the `responses_api_agent` workflow type for that functionality
- **Tool Format Conversion**: Responses API tool definitions are converted to Chat Completions format internally
Comment on lines +197 to +201
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Replace "NAT" acronym with "NeMo Agent Toolkit" in documentation.

Lines 200 uses "NAT" as an acronym. The coding guidelines require spelling out "NeMo Agent Toolkit" in Markdown documentation.

Suggested fix
-- **No Built-in Tools**: OpenAI built-in tools like `code_interpreter` are not executed by NAT; use the `responses_api_agent` workflow type for that functionality
+- **No Built-in Tools**: OpenAI built-in tools like `code_interpreter` are not executed by NeMo Agent Toolkit; use the `responses_api_agent` workflow type for that functionality

As per coding guidelines: "Documentation in Markdown files should not use NAT as an acronym, always spell out NeMo Agent Toolkit."

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
## Limitations
- **No Stateful Backend**: `previous_response_id` is accepted but ignored
- **No Built-in Tools**: OpenAI built-in tools like `code_interpreter` are not executed by NAT; use the `responses_api_agent` workflow type for that functionality
- **Tool Format Conversion**: Responses API tool definitions are converted to Chat Completions format internally
## Limitations
- **No Stateful Backend**: `previous_response_id` is accepted but ignored
- **No Built-in Tools**: OpenAI built-in tools like `code_interpreter` are not executed by NeMo Agent Toolkit; use the `responses_api_agent` workflow type for that functionality
- **Tool Format Conversion**: Responses API tool definitions are converted to Chat Completions format internally
🤖 Prompt for AI Agents
In `@examples/front_ends/responses_api_endpoint/README.md` around lines 197 - 201,
Update the README Limitations section to replace the acronym "NAT" with the full
name "NeMo Agent Toolkit" where it appears (specifically the bullet "- **No
Built-in Tools**: OpenAI built-in tools like `code_interpreter` are not executed
by NAT; use the `responses_api_agent` workflow type for that functionality"),
ensuring the sentence now reads that those tools are not executed by the NeMo
Agent Toolkit and preserving the rest of the text and formatting.


## Related Examples

- [Tool Calling Agent with Responses API](../../agents/tool_calling/README.md#using-tool-calling-with-the-openai-responses-api) - For using OpenAI's Responses API directly with built-in tools
- [Simple Auth](../simple_auth/README.md) - Authentication example
- [Custom Routes](../simple_calculator_custom_routes/README.md) - Custom endpoint routes

77 changes: 77 additions & 0 deletions examples/front_ends/responses_api_endpoint/configs/config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
# SPDX-FileCopyrightText: Copyright (c) 2024-2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Example configuration for testing the OpenAI Responses API endpoint format.
#
# This config demonstrates using the FastAPI frontend's /v1/responses endpoint
# which accepts the OpenAI Responses API format (with 'input' field) and routes
# requests through a standard tool_calling_agent workflow.
#
# NOTE: The Responses API format is provided for pass-through compatibility.
# NAT agents do not inherently support stateful backends; features like
# 'previous_response_id' will be ignored.
#
# Prerequisites:
# uv pip install -e '.[langchain]' # Required for tool_calling_agent
# export NVIDIA_API_KEY=<YOUR_KEY>
#
# Usage:
# nat serve --config_file examples/front_ends/responses_api_endpoint/configs/config.yml
#
# Test with curl (Responses API format):
# curl -X POST http://localhost:8088/v1/responses \
# -H 'Content-Type: application/json' \
# -d '{"model": "gpt-4o-mini", "input": "What time is it?"}'
#
# Test with curl (Streaming):
# curl -X POST http://localhost:8088/v1/responses \
# -H 'Content-Type: application/json' \
# -d '{"model": "gpt-4o-mini", "input": "What time is it?", "stream": true}'
#
# The /generate and /chat endpoints still use the Chat Completions format:
# curl -X POST http://localhost:8088/generate \
# -H 'Content-Type: application/json' \
# -d '{"messages": [{"role": "user", "content": "What time is it?"}]}'

general:
use_uvloop: true
front_end:
_type: fastapi
port: 8088
workflow:
method: POST
path: /generate
openai_api_path: /chat
openai_api_v1_path: /v1/responses
description: "Test OpenAI Responses API endpoint format"

llms:
nim_llm:
_type: nim
model_name: meta/llama-3.1-70b-instruct
temperature: 0.0
max_tokens: 250

functions:
current_datetime:
_type: current_datetime

workflow:
_type: tool_calling_agent
llm_name: nim_llm
tool_names: [current_datetime]
verbose: true
handle_tool_errors: true

Loading