test: add Codex observability contract coverage by yczhang-nv · Pull Request #234 · NVIDIA/NeMo-Relay

yczhang-nv · 2026-06-05T22:43:19Z

Overview

Adds Codex observability contract coverage across raw ATOF, ATIF, and OpenInference outputs.

I confirm this contribution is my own work, or I have the right to submit it under this project's license.
I searched existing issues and open pull requests, and this does not duplicate existing work.

Details

Adds a Codex ATOF test that verifies Stop closes the turn scope, preserves tool start/end pairing, and emits expected UUID/parent UUID linkage and metadata.
Adds a Codex ATIF test that verifies per-turn snapshots are written on Stop without requiring a sessionEnd hook.
Adds a Codex OpenInference test with a local OTLP collector to assert AGENT, LLM, and TOOL span contract fields.

Validation:

cargo fmt --check
cargo test -p nemo-relay-cli codex -- --test-threads=1
cargo test -p nemo-relay-cli -- --test-threads=1

Where should the reviewer start?

Start with crates/cli/tests/coverage/server_tests.rs for the ATOF contract, then crates/cli/tests/coverage/session_tests.rs for the ATIF and OpenInference contract tests.

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Closes #

Summary by CodeRabbit

Tests
- Enhanced test infrastructure with new helpers for observability event validation and subscriber cleanup.
- Added integration tests validating Codex hook sequences and observability event generation (ATOF/ATIF/OpenInference).
- Added coverage tests verifying observability span contracts and trajectory snapshots for Codex operations.

RELAY-206: Ensure Codex ATOF output consistency Add Codex fixtures and assertions so raw ATOF output matches the shared contract. Covers Codex Stop turn closure, tool scope pairing, UUID/parent UUID linkage, and ATOF 0.1 event metadata. Signed-off-by: Yuchen Zhang <yuchenz@nvidia.com>

RELAY-207: Ensure Codex ATIF output consistency Add Codex fixtures and assertions so ATIF trajectory output matches the shared contract, including per-turn snapshot behavior. Covers Stop-triggered Codex turn snapshots without requiring a sessionEnd hook. Signed-off-by: Yuchen Zhang <yuchenz@nvidia.com>

RELAY-205: Ensure Codex OpenInference output consistency Add Codex fixtures and assertions so OpenInference spans match the shared contract. Covers AGENT, LLM, and TOOL span export fields, UUID/parent UUID linkage, tool-call attributes, and Stop-based Codex turn closure. Signed-off-by: Yuchen Zhang <yuchenz@nvidia.com>

coderabbitai · 2026-06-05T22:43:31Z

Wondering what really moved? Review this PR in Change Stack to inspect semantic changes, definitions, and references.

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 36bdb2b1-b594-4269-898f-7cc228d819b0

📥 Commits

Reviewing files that changed from the base of the PR and between 605ad75 and f1ecfa9.

📒 Files selected for processing (1)

crates/cli/tests/coverage/session_tests.rs

📜 Recent review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Check / Run

🧰 Additional context used

📓 Path-based instructions (10)

**/*.rs

📄 CodeRabbit inference engine (.agents/skills/add-binding-feature/SKILL.md)

Use snake_case naming convention for Rust identifiers (e.g., nemo_relay_tool_call)

**/*.rs: Any Rust change must run just test-rust
Any Rust change must run cargo fmt --all
Any Rust change must run cargo clippy --workspace --all-targets -- -D warnings

**/*.rs: Run cargo fmt --all for all FFI work since it is Rust work
Run just test-rust to validate FFI changes
Run cargo clippy --workspace --all-targets -- -D warnings to enforce strict linting on FFI work

When Rust files changed as part of Go work, also run cargo fmt --all, just test-rust, and cargo clippy --workspace --all-targets -- -D warnings

**/*.rs: Run cargo fmt --all when Rust files are changed as part of Node work
Run cargo clippy --workspace --all-targets -- -D warnings when Rust files are changed as part of Node work
Run just test-rust when Rust files are changed as part of Node work

**/*.rs: Run cargo fmt --all to format all Rust code
Run cargo clippy --workspace --all-targets -- -D warnings to enforce all clippy lints as errors

**/*.rs: Run cargo fmt --all when Rust files changed as part of WebAssembly work
Run cargo clippy --workspace --all-targets -- -D warnings when Rust files changed as part of WebAssembly work

**/*.rs: If any Rust code changed, always run just test-rust
If any Rust code changed, also run cargo fmt --all
If any Rust code changed, also run cargo clippy --workspace --all-targets -- -D warnings
Run Rust formatting with cargo fmt --all
Run Rust linting with cargo clippy --workspace --all-targets -- -D warnings

**/*.rs: Use cargo fmt for Rust code formatting
Run cargo clippy -- -D warnings to lint Rust code and treat all warnings as errors
Use Rust snake_case naming convention for Rust identifiers
Include SPDX license header in all Rust source files using double-slash comment syntax
Validate Rust code with uv run pre-commit run --all-files to enforce cargo fmt formatting check, cargo clippy lints, and cargo deny aud...

Files:

crates/cli/tests/coverage/session_tests.rs

{crates/adaptive/**/*.rs,**/*test*.{rs,py,go,ts,js},**/*adaptive*test*.{rs,py,go,ts,js},docs/plugins/adaptive/**}

📄 CodeRabbit inference engine (.agents/skills/maintain-optimizer/SKILL.md)

Maintain documented and tested validation and report behavior for adaptive surfaces

Files:

crates/cli/tests/coverage/session_tests.rs

**/{Cargo.toml,**/*.rs}

📄 CodeRabbit inference engine (.agents/skills/maintain-packaging/SKILL.md)

Maintain consistency between Rust package names in Cargo.toml and their actual usage across the codebase

Files:

crates/cli/tests/coverage/session_tests.rs

**/*.{h,hpp,c,cpp,rs}

📄 CodeRabbit inference engine (.agents/skills/maintain-packaging/SKILL.md)

Ensure FFI header and library naming follows consistent conventions across platform-specific builds

Files:

crates/cli/tests/coverage/session_tests.rs

**/*.{rs,toml}

📄 CodeRabbit inference engine (.agents/skills/rename-surfaces/SKILL.md)

Update Rust crate names and module prefixes during coordinated rename operations

Files:

crates/cli/tests/coverage/session_tests.rs

**/*.{rs,py,js,ts,tsx,jsx,go,sh,toml,yaml,yml,md}

📄 CodeRabbit inference engine (AGENTS.md)

Keep SPDX headers on source, docs, scripts, and configuration files. The project is Apache-2.0.

Files:

crates/cli/tests/coverage/session_tests.rs

**/*.{rs,py,go,js,ts,tsx}

📄 CodeRabbit inference engine (AGENTS.md)

Follow binding naming conventions: Rust and Python use snake_case, C FFI exports prefixed nemo_relay_, Go uses PascalCase for public APIs, Node.js uses camelCase.

Files:

crates/cli/tests/coverage/session_tests.rs

crates/**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

crates/**/*.rs: Keep async behavior on the existing tokio-based model. Bindings should preserve callback and future lifetimes rather than blocking or hiding async work unexpectedly.
Use Json = serde_json::Value in Rust-facing runtime APIs for JSON payload handling.

Files:

crates/cli/tests/coverage/session_tests.rs

{crates/**/tests/**,python/tests/**,go/nemo_relay/**/*_test.go}

⚙️ CodeRabbit configuration file

{crates/**/tests/**,python/tests/**,go/nemo_relay/**/*_test.go}: Tests should cover the behavior promised by the changed API surface, including error paths and cross-request isolation where relevant.
Prefer assertions on lifecycle events, scope stacks, middleware ordering, and binding parity over shallow smoke tests.

Files:

crates/cli/tests/coverage/session_tests.rs

**

⚙️ CodeRabbit configuration file

**:

AGENTS.md

This file provides guidance to agents, including Claude Code and OpenAI Codex, when working in this repository.

Project Overview

NeMo Relay is a multi-language agent runtime framework for execution scopes, lifecycle events, middleware, plugins, and observability around tool and LLM calls. The core runtime is Rust. Primary supported bindings are Rust, Python, and Node.js. Go, WebAssembly, and the raw C FFI are experimental and source-first.

The shared runtime model is:

Scope stacks decide where work belongs and which scope-local behavior is visible.

Middleware registries decide what guardrails and intercepts run around managed calls.

Plugins install reusable runtime behavior from configuration.

Events record runtime behavior in ATOF form.

Subscribers and exporters consume events in-process or export them to ATIF, OpenTelemetry, OpenInference, or other backends.

Repository Structure

The repository layout separates the Rust runtime, language bindings, documentation,
integration patches, and agent-facing skills.
crates/
  core/       # Rust core runtime crate, published as nemo-relay
  adaptive/   # Adaptive runtime primitives and plugin components
  python/     # PyO3 native extension for the Python package
  ffi/        # Raw C ABI layer used by downstream bindings such as Go
  node/       # NAPI Node.js binding and JavaScript/TypeScript entry points
  wasm/       # wasm-bindgen WebAssembly binding and JS wrappers
python/
  nemo_relay/  # Python wrapper package: scopes, tools, LLM, middleware, typed helpers, plugins, adaptive helpers
  tests/      # Python tests
go/
  nemo_relay/  # Experimental Go CGo binding and tests
fern/         # Fern documentation site
scripts/      # Stable wrappers and helper scripts; build/test/docs entry points live in justfile
third_party/  # P...

Files:

crates/cli/tests/coverage/session_tests.rs

🔇 Additional comments (2)

crates/cli/tests/coverage/session_tests.rs (2)

1715-1749: Span assertions still mask duplicate spans and lineage regressions.

Line 1715 still collapses spans into a HashMap keyed by span name, so duplicate codex-turn / openai.responses / Read spans overwrite each other. The test also only checks parent UUID presence, not equality to the turn UUID, so broken parent-child linkage can pass.

1-3: Run the required Rust validation commands for this Rust change.

Please verify this change set with:

cargo fmt --all

cargo clippy --workspace --all-targets -- -D warnings

just test-rust

As per coding guidelines, "If any Rust code changed, always run just test-rust; also run cargo fmt --all and cargo clippy --workspace --all-targets -- -D warnings."

Source: Coding guidelines

Walkthrough

PR adds observability test infrastructure: RAII subscriber cleanup and scope event diagnostics helpers, an ATOF server integration test validating Codex hook recording with detailed scope assertions, Codex orchestration test helpers for driving prompt/tool/stop workflows, and two session-level tests validating ATIF trajectory snapshots and OpenInference span exports with contract assertions.

Changes

Observability Test Coverage for Codex ATOF, ATIF, and OpenInference Exports

Layer / File(s)	Summary
ATOF test helpers and server integration test `crates/cli/tests/coverage/server_tests.rs`	Adds `SubscriberCleanup` RAII guard and `find_scope_event` diagnostic helper; implements `serve_listener_records_codex_stop_atof_contract` integration test validating Codex hook recording to ATOF event files with scope UUID/parent and metadata assertions; refactors Claude startup-probe to use RAII cleanup.
Codex test orchestration helpers `crates/cli/tests/coverage/session_tests.rs`	Adds `apply_codex_payload`, `run_codex_prompt_turn`, `run_codex_tool_activity`, and `stop_codex_turn` helpers to orchestrate Codex workflows in tests via `SessionManager` hook payloads.
ATIF trajectory snapshot validation `crates/cli/tests/coverage/session_tests.rs`	`codex_stop_snapshots_atif_without_session_end` installs ATIF plugin, runs prompt+tool sequence, validates ATIF trajectory contents including observed events (excluding `sessionEnd`), step ancestry/lineage, and tool output.
OpenInference span export validation `crates/cli/tests/coverage/session_tests.rs`	`codex_openinference_spans_match_shared_contract` registers OpenInference subscriber, runs Codex prompt+tool+Stop, asserts span kind/contract metadata for turn, LLM, and tool spans, and verifies no span attributes contain `sessionEnd`.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

NVIDIA/NeMo-Relay#235: Adds OpenInference-focused in-memory subscriber and span assertion patterns in session_tests.rs for Hermes routed-provider observability validation.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 34.62% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title follows Conventional Commits format with 'test' type and concise imperative summary under 72 characters.
Description check	✅ Passed	The description includes all required sections: Overview with confirmed checklist items, Details with specific changes, Where should the reviewer start, and Related Issues.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/cli/tests/coverage/session_tests.rs`:
- Around line 1582-1607: The test uses String::from_utf8_lossy to treat the
protobuf body as a string for a contract smoke test (checking that ASCII span
identifiers like "AGENT", "LLM", "TOOL", etc. are present); add a short inline
comment above the block that references String::from_utf8_lossy and the contains
assertions to explain this is intentional because ASCII identifiers survive the
wire format and note that if stricter validation is needed later we should add
opentelemetry-proto as a dev-dependency and deserialize the protobuf for precise
field checks.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: d651aaf9-0ac6-4d0e-9c14-a534aaaa50dd

📥 Commits

Reviewing files that changed from the base of the PR and between 02e4daa and bc36694.

📒 Files selected for processing (2)

crates/cli/tests/coverage/server_tests.rs
crates/cli/tests/coverage/session_tests.rs

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Check / Run
GitHub Check: Apply PR labels

🧰 Additional context used

📓 Path-based instructions (10)

**/*.rs

📄 CodeRabbit inference engine (.agents/skills/add-binding-feature/SKILL.md)

Use snake_case naming convention for Rust identifiers (e.g., nemo_relay_tool_call)

**/*.rs: Any Rust change must run just test-rust
Any Rust change must run cargo fmt --all
Any Rust change must run cargo clippy --workspace --all-targets -- -D warnings

**/*.rs: Run cargo fmt --all for all FFI work since it is Rust work
Run just test-rust to validate FFI changes
Run cargo clippy --workspace --all-targets -- -D warnings to enforce strict linting on FFI work

When Rust files changed as part of Go work, also run cargo fmt --all, just test-rust, and cargo clippy --workspace --all-targets -- -D warnings

**/*.rs: Run cargo fmt --all when Rust files are changed as part of Node work
Run cargo clippy --workspace --all-targets -- -D warnings when Rust files are changed as part of Node work
Run just test-rust when Rust files are changed as part of Node work

**/*.rs: Run cargo fmt --all to format all Rust code
Run cargo clippy --workspace --all-targets -- -D warnings to enforce all clippy lints as errors

**/*.rs: Run cargo fmt --all when Rust files changed as part of WebAssembly work
Run cargo clippy --workspace --all-targets -- -D warnings when Rust files changed as part of WebAssembly work

**/*.rs: If any Rust code changed, always run just test-rust
If any Rust code changed, also run cargo fmt --all
If any Rust code changed, also run cargo clippy --workspace --all-targets -- -D warnings
Run Rust formatting with cargo fmt --all
Run Rust linting with cargo clippy --workspace --all-targets -- -D warnings

**/*.rs: Use cargo fmt for Rust code formatting
Run cargo clippy -- -D warnings to lint Rust code and treat all warnings as errors
Use Rust snake_case naming convention for Rust identifiers
Include SPDX license header in all Rust source files using double-slash comment syntax
Validate Rust code with uv run pre-commit run --all-files to enforce cargo fmt formatting check, cargo clippy lints, and cargo deny aud...

Files:

crates/cli/tests/coverage/server_tests.rs
crates/cli/tests/coverage/session_tests.rs

{crates/adaptive/**/*.rs,**/*test*.{rs,py,go,ts,js},**/*adaptive*test*.{rs,py,go,ts,js},docs/plugins/adaptive/**}

📄 CodeRabbit inference engine (.agents/skills/maintain-optimizer/SKILL.md)

Maintain documented and tested validation and report behavior for adaptive surfaces

Files:

crates/cli/tests/coverage/server_tests.rs
crates/cli/tests/coverage/session_tests.rs

**/{Cargo.toml,**/*.rs}

📄 CodeRabbit inference engine (.agents/skills/maintain-packaging/SKILL.md)

Maintain consistency between Rust package names in Cargo.toml and their actual usage across the codebase

Files:

crates/cli/tests/coverage/server_tests.rs
crates/cli/tests/coverage/session_tests.rs

**/*.{h,hpp,c,cpp,rs}

📄 CodeRabbit inference engine (.agents/skills/maintain-packaging/SKILL.md)

Ensure FFI header and library naming follows consistent conventions across platform-specific builds

Files:

crates/cli/tests/coverage/server_tests.rs
crates/cli/tests/coverage/session_tests.rs

**/*.{rs,toml}

📄 CodeRabbit inference engine (.agents/skills/rename-surfaces/SKILL.md)

Update Rust crate names and module prefixes during coordinated rename operations

Files:

crates/cli/tests/coverage/server_tests.rs
crates/cli/tests/coverage/session_tests.rs

**/*.{rs,py,js,ts,tsx,jsx,go,sh,toml,yaml,yml,md}

📄 CodeRabbit inference engine (AGENTS.md)

Keep SPDX headers on source, docs, scripts, and configuration files. The project is Apache-2.0.

Files:

crates/cli/tests/coverage/server_tests.rs
crates/cli/tests/coverage/session_tests.rs

**/*.{rs,py,go,js,ts,tsx}

📄 CodeRabbit inference engine (AGENTS.md)

Follow binding naming conventions: Rust and Python use snake_case, C FFI exports prefixed nemo_relay_, Go uses PascalCase for public APIs, Node.js uses camelCase.

Files:

crates/cli/tests/coverage/server_tests.rs
crates/cli/tests/coverage/session_tests.rs

crates/**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

crates/**/*.rs: Keep async behavior on the existing tokio-based model. Bindings should preserve callback and future lifetimes rather than blocking or hiding async work unexpectedly.
Use Json = serde_json::Value in Rust-facing runtime APIs for JSON payload handling.

Files:

crates/cli/tests/coverage/server_tests.rs
crates/cli/tests/coverage/session_tests.rs

{crates/**/tests/**,python/tests/**,go/nemo_relay/**/*_test.go}

⚙️ CodeRabbit configuration file

{crates/**/tests/**,python/tests/**,go/nemo_relay/**/*_test.go}: Tests should cover the behavior promised by the changed API surface, including error paths and cross-request isolation where relevant.
Prefer assertions on lifecycle events, scope stacks, middleware ordering, and binding parity over shallow smoke tests.

Files:

crates/cli/tests/coverage/server_tests.rs
crates/cli/tests/coverage/session_tests.rs

**

⚙️ CodeRabbit configuration file

**:

AGENTS.md

This file provides guidance to agents, including Claude Code and OpenAI Codex, when working in this repository.

Project Overview

NeMo Relay is a multi-language agent runtime framework for execution scopes, lifecycle events, middleware, plugins, and observability around tool and LLM calls. The core runtime is Rust. Primary supported bindings are Rust, Python, and Node.js. Go, WebAssembly, and the raw C FFI are experimental and source-first.

The shared runtime model is:

Scope stacks decide where work belongs and which scope-local behavior is visible.

Middleware registries decide what guardrails and intercepts run around managed calls.

Plugins install reusable runtime behavior from configuration.

Events record runtime behavior in ATOF form.

Subscribers and exporters consume events in-process or export them to ATIF, OpenTelemetry, OpenInference, or other backends.

Repository Structure

The repository layout separates the Rust runtime, language bindings, documentation,
integration patches, and agent-facing skills.
crates/
  core/       # Rust core runtime crate, published as nemo-relay
  adaptive/   # Adaptive runtime primitives and plugin components
  python/     # PyO3 native extension for the Python package
  ffi/        # Raw C ABI layer used by downstream bindings such as Go
  node/       # NAPI Node.js binding and JavaScript/TypeScript entry points
  wasm/       # wasm-bindgen WebAssembly binding and JS wrappers
python/
  nemo_relay/  # Python wrapper package: scopes, tools, LLM, middleware, typed helpers, plugins, adaptive helpers
  tests/      # Python tests
go/
  nemo_relay/  # Experimental Go CGo binding and tests
fern/         # Fern documentation site
scripts/      # Stable wrappers and helper scripts; build/test/docs entry points live in justfile
third_party/  # P...

Files:

crates/cli/tests/coverage/server_tests.rs
crates/cli/tests/coverage/session_tests.rs

🧠 Learnings (8)

📓 Common learnings

Learnt from: CR
Repo: NVIDIA/NeMo-Relay PR: 0
File: .agents/skills/maintain-observability/SKILL.md:0-0
Timestamp: 2026-05-21T22:48:57.484Z
Learning: Applies to crates/core/src/observability/{atif,otel,openinference}.rs : When changing event fields in ATIF, OpenTelemetry, or OpenInference observability surfaces, keep the core event model in `crates/core/src/observability/atif.rs`, `crates/core/src/observability/otel.rs`, and `crates/core/src/observability/openinference.rs` in sync

Learnt from: CR
Repo: NVIDIA/NeMo-Relay PR: 0
File: .agents/skills/maintain-observability/SKILL.md:0-0
Timestamp: 2026-05-21T22:48:57.484Z
Learning: When event fields change, run Rust crate tests and execute `just test-rust` to validate the affected observability modules

Learnt from: CR
Repo: NVIDIA/NeMo-Relay PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-05-29T21:25:49.977Z
Learning: Run tests for every language affected by a change. If you touch the Rust core runtime, middleware semantics, event shape, scope behavior, typed codecs, plugins, or observability, expect to validate every affected binding because the bindings share the same runtime contract.

Learnt from: CR
Repo: NVIDIA/NeMo-Relay PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-05-29T21:25:49.977Z
Learning: Exporters can transform runtime events to ATIF trajectories, OpenTelemetry traces, or OpenInference-compatible output. Root scope identity is used to isolate concurrent agents.

Learnt from: CR
Repo: NVIDIA/NeMo-Relay PR: 0
File: .agents/skills/maintain-observability/SKILL.md:0-0
Timestamp: 2026-05-21T22:48:57.484Z
Learning: Verify that mark events, start/end events, and orphan cases are still handled correctly when modifying observability surfaces

Learnt from: CR
Repo: NVIDIA/NeMo-Relay PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-05-29T21:25:49.977Z
Learning: Events use ATOF `0.1` as the canonical event format. Scope events use start/end pairs; mark events record runtime checkpoints.

📚 Learning: 2026-05-21T22:48:57.484Z

Learnt from: CR
Repo: NVIDIA/NeMo-Relay PR: 0
File: .agents/skills/maintain-observability/SKILL.md:0-0
Timestamp: 2026-05-21T22:48:57.484Z
Learning: Applies to crates/core/src/observability/{atif,otel,openinference}.rs : When changing event fields in ATIF, OpenTelemetry, or OpenInference observability surfaces, keep the core event model in `crates/core/src/observability/atif.rs`, `crates/core/src/observability/otel.rs`, and `crates/core/src/observability/openinference.rs` in sync

Applied to files:

crates/cli/tests/coverage/server_tests.rs
crates/cli/tests/coverage/session_tests.rs

📚 Learning: 2026-05-21T22:48:57.484Z

Learnt from: CR
Repo: NVIDIA/NeMo-Relay PR: 0
File: .agents/skills/maintain-observability/SKILL.md:0-0
Timestamp: 2026-05-21T22:48:57.484Z
Learning: When event fields change, run Rust crate tests and execute `just test-rust` to validate the affected observability modules

Applied to files:

crates/cli/tests/coverage/server_tests.rs
crates/cli/tests/coverage/session_tests.rs

📚 Learning: 2026-05-29T21:25:49.977Z

Learnt from: CR
Repo: NVIDIA/NeMo-Relay PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-05-29T21:25:49.977Z
Learning: Run tests for every language affected by a change. If you touch the Rust core runtime, middleware semantics, event shape, scope behavior, typed codecs, plugins, or observability, expect to validate every affected binding because the bindings share the same runtime contract.

Applied to files:

crates/cli/tests/coverage/server_tests.rs
crates/cli/tests/coverage/session_tests.rs

📚 Learning: 2026-05-29T21:25:49.977Z

Learnt from: CR
Repo: NVIDIA/NeMo-Relay PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-05-29T21:25:49.977Z
Learning: Events use ATOF `0.1` as the canonical event format. Scope events use start/end pairs; mark events record runtime checkpoints.

Applied to files:

crates/cli/tests/coverage/server_tests.rs

📚 Learning: 2026-05-21T22:51:27.841Z

Learnt from: CR
Repo: NVIDIA/NeMo-Relay PR: 0
File: .agents/skills/test-rust-core/SKILL.md:0-0
Timestamp: 2026-05-21T22:51:27.841Z
Learning: Use narrower crate tests as a local debug loop, not as the final validation story for Rust changes

Applied to files:

crates/cli/tests/coverage/server_tests.rs
crates/cli/tests/coverage/session_tests.rs

📚 Learning: 2026-05-21T22:49:09.811Z

Learnt from: CR
Repo: NVIDIA/NeMo-Relay PR: 0
File: .agents/skills/maintain-optimizer/SKILL.md:0-0
Timestamp: 2026-05-21T22:49:09.811Z
Learning: Applies to {crates/adaptive/**,python/nemo_relay/plugin.py,go/nemo_relay/**,**/node/**,**/wasm/**} : Maintain consistent plugin lifecycle across all language bindings (Python, Go, Node/WebAssembly, and Rust)

Applied to files:

crates/cli/tests/coverage/server_tests.rs
crates/cli/tests/coverage/session_tests.rs

📚 Learning: 2026-05-21T22:51:04.073Z

Learnt from: CR
Repo: NVIDIA/NeMo-Relay PR: 0
File: .agents/skills/test-go-binding/SKILL.md:0-0
Timestamp: 2026-05-21T22:51:04.073Z
Learning: Applies to crates/ffi/**/*.rs : If the change touched `crates/ffi`, also use `test-ffi-surface` for validation

Applied to files:

crates/cli/tests/coverage/session_tests.rs

🔇 Additional comments (7)

crates/cli/tests/coverage/server_tests.rs (2)

108-128: LGTM!

344-476: LGTM!

crates/cli/tests/coverage/session_tests.rs (5)

9-14: LGTM!

45-68: LGTM!

70-133: LGTM!

135-222: LGTM!

1476-1566: LGTM!

…ex-observability # Conflicts: # crates/cli/tests/coverage/server_tests.rs

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/cli/tests/coverage/server_tests.rs`:
- Around line 1750-1837: The test registers a global subscriber with
register_subscriber and currently calls deregister_subscriber only at the end,
which leaves the subscriber registered if the test panics; make cleanup
panic-safe by creating an RAII guard that deregisters in Drop (e.g., implement a
small SubscriberGuard that calls deregister_subscriber(subscriber_name) in its
Drop) and use it immediately after register_subscriber (or wrap the register
call in a helper that returns the guard); reference register_subscriber,
deregister_subscriber and the new SubscriberGuard (or similar) so the subscriber
is always removed even on panic.
- Around line 15-19: The test
gateway_forwards_claude_startup_probe_without_llm_observability currently
registers a global subscriber via register_subscriber(...) but only calls
deregister_subscriber(...) at the end, which can leak the subscriber on panic;
introduce a small RAII guard (e.g., a struct SubscriberGuard that stores the
subscriber_name and calls deregister_subscriber(subscriber_name) in Drop) and
instantiate it immediately after register_subscriber(...) so deregistration
always runs even if the test panics; after making this change, run just
test-rust, cargo fmt --all, and cargo clippy --workspace --all-targets -- -D
warnings to validate formatting and lints.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: fe2fa3fa-5537-4f40-9985-330e964e93ad

📥 Commits

Reviewing files that changed from the base of the PR and between bc36694 and 2e6b588.

📒 Files selected for processing (2)

crates/cli/tests/coverage/server_tests.rs
crates/cli/tests/coverage/session_tests.rs

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Check / Run

🧰 Additional context used

📓 Path-based instructions (10)

**/*.rs

📄 CodeRabbit inference engine (.agents/skills/add-binding-feature/SKILL.md)

Use snake_case naming convention for Rust identifiers (e.g., nemo_relay_tool_call)

**/*.rs: Any Rust change must run just test-rust
Any Rust change must run cargo fmt --all
Any Rust change must run cargo clippy --workspace --all-targets -- -D warnings

**/*.rs: Run cargo fmt --all for all FFI work since it is Rust work
Run just test-rust to validate FFI changes
Run cargo clippy --workspace --all-targets -- -D warnings to enforce strict linting on FFI work

When Rust files changed as part of Go work, also run cargo fmt --all, just test-rust, and cargo clippy --workspace --all-targets -- -D warnings

**/*.rs: Run cargo fmt --all when Rust files are changed as part of Node work
Run cargo clippy --workspace --all-targets -- -D warnings when Rust files are changed as part of Node work
Run just test-rust when Rust files are changed as part of Node work

**/*.rs: Run cargo fmt --all to format all Rust code
Run cargo clippy --workspace --all-targets -- -D warnings to enforce all clippy lints as errors

**/*.rs: Run cargo fmt --all when Rust files changed as part of WebAssembly work
Run cargo clippy --workspace --all-targets -- -D warnings when Rust files changed as part of WebAssembly work

**/*.rs: If any Rust code changed, always run just test-rust
If any Rust code changed, also run cargo fmt --all
If any Rust code changed, also run cargo clippy --workspace --all-targets -- -D warnings
Run Rust formatting with cargo fmt --all
Run Rust linting with cargo clippy --workspace --all-targets -- -D warnings

**/*.rs: Use cargo fmt for Rust code formatting
Run cargo clippy -- -D warnings to lint Rust code and treat all warnings as errors
Use Rust snake_case naming convention for Rust identifiers
Include SPDX license header in all Rust source files using double-slash comment syntax
Validate Rust code with uv run pre-commit run --all-files to enforce cargo fmt formatting check, cargo clippy lints, and cargo deny aud...

Files:

crates/cli/tests/coverage/server_tests.rs
crates/cli/tests/coverage/session_tests.rs

{crates/adaptive/**/*.rs,**/*test*.{rs,py,go,ts,js},**/*adaptive*test*.{rs,py,go,ts,js},docs/plugins/adaptive/**}

📄 CodeRabbit inference engine (.agents/skills/maintain-optimizer/SKILL.md)

Maintain documented and tested validation and report behavior for adaptive surfaces

Files:

crates/cli/tests/coverage/server_tests.rs
crates/cli/tests/coverage/session_tests.rs

**/{Cargo.toml,**/*.rs}

📄 CodeRabbit inference engine (.agents/skills/maintain-packaging/SKILL.md)

Maintain consistency between Rust package names in Cargo.toml and their actual usage across the codebase

Files:

crates/cli/tests/coverage/server_tests.rs
crates/cli/tests/coverage/session_tests.rs

**/*.{h,hpp,c,cpp,rs}

📄 CodeRabbit inference engine (.agents/skills/maintain-packaging/SKILL.md)

Ensure FFI header and library naming follows consistent conventions across platform-specific builds

Files:

crates/cli/tests/coverage/server_tests.rs
crates/cli/tests/coverage/session_tests.rs

**/*.{rs,toml}

📄 CodeRabbit inference engine (.agents/skills/rename-surfaces/SKILL.md)

Update Rust crate names and module prefixes during coordinated rename operations

Files:

crates/cli/tests/coverage/server_tests.rs
crates/cli/tests/coverage/session_tests.rs

**/*.{rs,py,js,ts,tsx,jsx,go,sh,toml,yaml,yml,md}

📄 CodeRabbit inference engine (AGENTS.md)

Keep SPDX headers on source, docs, scripts, and configuration files. The project is Apache-2.0.

Files:

crates/cli/tests/coverage/server_tests.rs
crates/cli/tests/coverage/session_tests.rs

**/*.{rs,py,go,js,ts,tsx}

📄 CodeRabbit inference engine (AGENTS.md)

Follow binding naming conventions: Rust and Python use snake_case, C FFI exports prefixed nemo_relay_, Go uses PascalCase for public APIs, Node.js uses camelCase.

Files:

crates/cli/tests/coverage/server_tests.rs
crates/cli/tests/coverage/session_tests.rs

crates/**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

crates/**/*.rs: Keep async behavior on the existing tokio-based model. Bindings should preserve callback and future lifetimes rather than blocking or hiding async work unexpectedly.
Use Json = serde_json::Value in Rust-facing runtime APIs for JSON payload handling.

Files:

crates/cli/tests/coverage/server_tests.rs
crates/cli/tests/coverage/session_tests.rs

{crates/**/tests/**,python/tests/**,go/nemo_relay/**/*_test.go}

⚙️ CodeRabbit configuration file

{crates/**/tests/**,python/tests/**,go/nemo_relay/**/*_test.go}: Tests should cover the behavior promised by the changed API surface, including error paths and cross-request isolation where relevant.
Prefer assertions on lifecycle events, scope stacks, middleware ordering, and binding parity over shallow smoke tests.

Files:

crates/cli/tests/coverage/server_tests.rs
crates/cli/tests/coverage/session_tests.rs

**

⚙️ CodeRabbit configuration file

**:

AGENTS.md

This file provides guidance to agents, including Claude Code and OpenAI Codex, when working in this repository.

Project Overview

NeMo Relay is a multi-language agent runtime framework for execution scopes, lifecycle events, middleware, plugins, and observability around tool and LLM calls. The core runtime is Rust. Primary supported bindings are Rust, Python, and Node.js. Go, WebAssembly, and the raw C FFI are experimental and source-first.

The shared runtime model is:

Scope stacks decide where work belongs and which scope-local behavior is visible.

Middleware registries decide what guardrails and intercepts run around managed calls.

Plugins install reusable runtime behavior from configuration.

Events record runtime behavior in ATOF form.

Subscribers and exporters consume events in-process or export them to ATIF, OpenTelemetry, OpenInference, or other backends.

Repository Structure

The repository layout separates the Rust runtime, language bindings, documentation,
integration patches, and agent-facing skills.
crates/
  core/       # Rust core runtime crate, published as nemo-relay
  adaptive/   # Adaptive runtime primitives and plugin components
  python/     # PyO3 native extension for the Python package
  ffi/        # Raw C ABI layer used by downstream bindings such as Go
  node/       # NAPI Node.js binding and JavaScript/TypeScript entry points
  wasm/       # wasm-bindgen WebAssembly binding and JS wrappers
python/
  nemo_relay/  # Python wrapper package: scopes, tools, LLM, middleware, typed helpers, plugins, adaptive helpers
  tests/      # Python tests
go/
  nemo_relay/  # Experimental Go CGo binding and tests
fern/         # Fern documentation site
scripts/      # Stable wrappers and helper scripts; build/test/docs entry points live in justfile
third_party/  # P...

Files:

crates/cli/tests/coverage/server_tests.rs
crates/cli/tests/coverage/session_tests.rs

🧠 Learnings (9)

📓 Common learnings

Learnt from: CR
Repo: NVIDIA/NeMo-Relay PR: 0
File: .agents/skills/maintain-observability/SKILL.md:0-0
Timestamp: 2026-05-21T22:48:57.484Z
Learning: Applies to crates/core/src/observability/{atif,otel,openinference}.rs : When changing event fields in ATIF, OpenTelemetry, or OpenInference observability surfaces, keep the core event model in `crates/core/src/observability/atif.rs`, `crates/core/src/observability/otel.rs`, and `crates/core/src/observability/openinference.rs` in sync

Learnt from: CR
Repo: NVIDIA/NeMo-Relay PR: 0
File: .agents/skills/maintain-observability/SKILL.md:0-0
Timestamp: 2026-05-21T22:48:57.484Z
Learning: Verify that mark events, start/end events, and orphan cases are still handled correctly when modifying observability surfaces

Learnt from: CR
Repo: NVIDIA/NeMo-Relay PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-05-29T21:25:49.977Z
Learning: Exporters can transform runtime events to ATIF trajectories, OpenTelemetry traces, or OpenInference-compatible output. Root scope identity is used to isolate concurrent agents.

Learnt from: CR
Repo: NVIDIA/NeMo-Relay PR: 0
File: .agents/skills/maintain-observability/SKILL.md:0-0
Timestamp: 2026-05-21T22:48:57.484Z
Learning: Ensure that span or trajectory fields are still derived from the intended event data when making observability changes

📚 Learning: 2026-05-21T22:48:57.484Z

Learnt from: CR
Repo: NVIDIA/NeMo-Relay PR: 0
File: .agents/skills/maintain-observability/SKILL.md:0-0
Timestamp: 2026-05-21T22:48:57.484Z
Learning: Applies to crates/core/src/observability/{atif,otel,openinference}.rs : When changing event fields in ATIF, OpenTelemetry, or OpenInference observability surfaces, keep the core event model in `crates/core/src/observability/atif.rs`, `crates/core/src/observability/otel.rs`, and `crates/core/src/observability/openinference.rs` in sync

Applied to files:

crates/cli/tests/coverage/server_tests.rs
crates/cli/tests/coverage/session_tests.rs

📚 Learning: 2026-05-21T22:48:57.484Z

Learnt from: CR
Repo: NVIDIA/NeMo-Relay PR: 0
File: .agents/skills/maintain-observability/SKILL.md:0-0
Timestamp: 2026-05-21T22:48:57.484Z
Learning: When event fields change, run Rust crate tests and execute `just test-rust` to validate the affected observability modules

Applied to files:

crates/cli/tests/coverage/server_tests.rs
crates/cli/tests/coverage/session_tests.rs

📚 Learning: 2026-05-29T21:25:49.977Z

Learnt from: CR
Repo: NVIDIA/NeMo-Relay PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-05-29T21:25:49.977Z
Learning: Events use ATOF `0.1` as the canonical event format. Scope events use start/end pairs; mark events record runtime checkpoints.

Applied to files:

crates/cli/tests/coverage/server_tests.rs

📚 Learning: 2026-05-29T21:25:49.977Z

Learnt from: CR
Repo: NVIDIA/NeMo-Relay PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-05-29T21:25:49.977Z
Learning: Run tests for every language affected by a change. If you touch the Rust core runtime, middleware semantics, event shape, scope behavior, typed codecs, plugins, or observability, expect to validate every affected binding because the bindings share the same runtime contract.

Applied to files:

crates/cli/tests/coverage/server_tests.rs
crates/cli/tests/coverage/session_tests.rs

📚 Learning: 2026-05-21T22:51:27.841Z

Learnt from: CR
Repo: NVIDIA/NeMo-Relay PR: 0
File: .agents/skills/test-rust-core/SKILL.md:0-0
Timestamp: 2026-05-21T22:51:27.841Z
Learning: Use narrower crate tests as a local debug loop, not as the final validation story for Rust changes

Applied to files:

crates/cli/tests/coverage/server_tests.rs
crates/cli/tests/coverage/session_tests.rs

📚 Learning: 2026-05-21T22:51:04.073Z

Learnt from: CR
Repo: NVIDIA/NeMo-Relay PR: 0
File: .agents/skills/test-go-binding/SKILL.md:0-0
Timestamp: 2026-05-21T22:51:04.073Z
Learning: Applies to crates/ffi/**/*.rs : If the change touched `crates/ffi`, also use `test-ffi-surface` for validation

Applied to files:

crates/cli/tests/coverage/server_tests.rs
crates/cli/tests/coverage/session_tests.rs

📚 Learning: 2026-05-21T22:47:58.898Z

Learnt from: CR
Repo: NVIDIA/NeMo-Relay PR: 0
File: .agents/skills/add-middleware/SKILL.md:0-0
Timestamp: 2026-05-21T22:47:58.898Z
Learning: Applies to crates/core/src/api/runtime/state.rs : Add chain execution helpers to `NemoRelayContextState` following the pattern of existing methods like `tool_sanitize_request_chain` or `tool_request_intercepts_chain`

Applied to files:

crates/cli/tests/coverage/server_tests.rs

📚 Learning: 2026-05-21T22:47:33.109Z

Learnt from: CR
Repo: NVIDIA/NeMo-Relay PR: 0
File: .agents/skills/add-binding-feature/SKILL.md:0-0
Timestamp: 2026-05-21T22:47:33.109Z
Learning: Applies to crates/ffi/src/api/**/*.rs : Use `nemo_relay_` prefix for C FFI function names (e.g., `nemo_relay_tool_call`)

Applied to files:

crates/cli/tests/coverage/server_tests.rs

🔇 Additional comments (4)

crates/cli/tests/coverage/session_tests.rs (2)
1775-1776: Run the repo-required Rust validation suite before merge.

The PR context only shows targeted cargo test -p nemo-relay-cli ... runs plus cargo fmt --check, but this repo requires the full Rust validation path for .rs changes: cargo fmt --all, cargo clippy --workspace --all-targets -- -D warnings, just test-rust, and uv run pre-commit run --all-files. As per coding guidelines, Rust changes must run those commands; based on learnings, narrower crate tests are a local debug loop rather than the final validation story.

Sources: Coding guidelines, Learnings

1849-1850: clear_plugin_configuration() already flushes subscribers before tearing down plugin registrations; no ATIF-teardown race
clear_plugin_configuration() calls crate::api::runtime::flush_subscribers() before it removes the active plugin configuration / runs rollback_registrations (deregister). read_atif_for_session() also flushes again, so the final ATIF output shouldn’t depend on teardown ordering.
Also applies to: 1953-1954, 2064-2065, 2426-2427, 2812-2813
			> Likely an incorrect or invalid review comment.
crates/cli/tests/coverage/server_tests.rs (2)

346-1098: LGTM!

1957-1968: LGTM!

Also applies to: 1980-1982

coderabbitai

Caution

Inline review comments failed to post. This is likely due to GitHub's internal server error or limits when posting large numbers of comments. If you are seeing this consistently it is likely a permissions issue. Please check "Moderation" -> "Code review limits" under your organization settings.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/cli/tests/coverage/server_tests.rs`:
- Around line 1750-1837: The test registers a global subscriber with
register_subscriber and currently calls deregister_subscriber only at the end,
which leaves the subscriber registered if the test panics; make cleanup
panic-safe by creating an RAII guard that deregisters in Drop (e.g., implement a
small SubscriberGuard that calls deregister_subscriber(subscriber_name) in its
Drop) and use it immediately after register_subscriber (or wrap the register
call in a helper that returns the guard); reference register_subscriber,
deregister_subscriber and the new SubscriberGuard (or similar) so the subscriber
is always removed even on panic.
- Around line 15-19: The test
gateway_forwards_claude_startup_probe_without_llm_observability currently
registers a global subscriber via register_subscriber(...) but only calls
deregister_subscriber(...) at the end, which can leak the subscriber on panic;
introduce a small RAII guard (e.g., a struct SubscriberGuard that stores the
subscriber_name and calls deregister_subscriber(subscriber_name) in Drop) and
instantiate it immediately after register_subscriber(...) so deregistration
always runs even if the test panics; after making this change, run just
test-rust, cargo fmt --all, and cargo clippy --workspace --all-targets -- -D
warnings to validate formatting and lints.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: fe2fa3fa-5537-4f40-9985-330e964e93ad

📥 Commits

Reviewing files that changed from the base of the PR and between bc36694 and 2e6b588.

📒 Files selected for processing (2)

crates/cli/tests/coverage/server_tests.rs
crates/cli/tests/coverage/session_tests.rs

📜 Review details

🔇 Additional comments (4)

crates/cli/tests/coverage/session_tests.rs (2)
1775-1776: Run the repo-required Rust validation suite before merge.

The PR context only shows targeted cargo test -p nemo-relay-cli ... runs plus cargo fmt --check, but this repo requires the full Rust validation path for .rs changes: cargo fmt --all, cargo clippy --workspace --all-targets -- -D warnings, just test-rust, and uv run pre-commit run --all-files. As per coding guidelines, Rust changes must run those commands; based on learnings, narrower crate tests are a local debug loop rather than the final validation story.

Sources: Coding guidelines, Learnings

1849-1850: clear_plugin_configuration() already flushes subscribers before tearing down plugin registrations; no ATIF-teardown race
clear_plugin_configuration() calls crate::api::runtime::flush_subscribers() before it removes the active plugin configuration / runs rollback_registrations (deregister). read_atif_for_session() also flushes again, so the final ATIF output shouldn’t depend on teardown ordering.
Also applies to: 1953-1954, 2064-2065, 2426-2427, 2812-2813
			> Likely an incorrect or invalid review comment.
crates/cli/tests/coverage/server_tests.rs (2)

346-1098: LGTM!

1957-1968: LGTM!

Also applies to: 1980-1982

🛑 Comments failed to post (2)

crates/cli/tests/coverage/server_tests.rs (2)

15-19: ⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

just test-rust
cargo fmt --all
cargo clippy --workspace --all-targets -- -D warnings

Repository: NVIDIA/NeMo-Relay

Length of output: 147

🏁 Script executed:

#!/bin/bash
set -euo pipefail

FILE="crates/cli/tests/coverage/server_tests.rs"

# Quick sanity
echo "== File: $FILE =="
wc -l "$FILE"

echo
echo "== Matching register/deregister subscriber calls =="
rg -n "register_subscriber|deregister_subscriber|flush_subscribers" "$FILE"

echo
echo "== Locate gateway_forwards_claude_startup_probe_without_llm_observability =="
rg -n "gateway_forwards_claude_startup_probe_without_llm_observability" "$FILE"

# Show a window around the function body (use the first match line)
START_LINE="$(rg -n "gateway_forwards_claude_startup_probe_without_llm_observability" "$FILE" | head -n1 | cut -d: -f1)"
# Print ~120 lines around it, but clamp to file bounds
MIN=$((START_LINE-80))
MAX=$((START_LINE+200))

echo
echo "== Context around function (lines $MIN-$MAX) =="
sed -n "${MIN},${MAX}p" "$FILE" | nl -ba -w3 -s':'

echo
echo "== Check for any similar patterns in nearby tests (optional small scan) =="
# Also scan for the subscriber deregistration call near the function
rg -n "deregister_subscriber" "$FILE" | awk -F: '{print $1 ":" $2}' | head -n 20

Repository: NVIDIA/NeMo-Relay

Length of output: 771

🏁 Script executed:

#!/bin/bash
set -euo pipefail

FILE="crates/cli/tests/coverage/server_tests.rs"

echo "== Context around gateway_forwards_claude_startup_probe_without_llm_observability =="
START_LINE="$(rg -n "gateway_forwards_claude_startup_probe_without_llm_observability" "$FILE" | head -n1 | cut -d: -f1)"
MIN=$((START_LINE-60))
MAX=$((START_LINE+130))

# Print with line numbers using awk (avoid nl)
awk -v min="$MIN" -v max="$MAX" 'NR>=min && NR<=max { printf "%5d:%s\n", NR, $0 }' "$FILE"

echo
echo "== Exact deregister_subscriber invocations in this file (snippets) =="
rg -n "deregister_subscriber" "$FILE" | while read -r line; do
  LNO="$(echo "$line" | cut -d: -f1)"
  echo "--- around deregister at line $LNO ---"
  awk -v min="$LNO-8" -v max="$LNO+8" 'NR>=min && NR<=max { printf "%5d:%s\n", NR, $0 }' "$FILE"
done

Repository: NVIDIA/NeMo-Relay

Length of output: 8757

Fix global subscriber cleanup to be panic-safe in gateway_forwards_claude_startup_probe_without_llm_observability

register_subscriber(...) is cleaned up only via deregister_subscriber(subscriber_name) at the end of the test; any panic/failed assert_*/unwrap() before that point can leak the global subscriber into later tests—wrap deregistration in a Drop/RAII guard so it always runs (even on panic).
Ensure this Rust-file change ran the required Rust validation suite: just test-rust, cargo fmt --all, and cargo clippy --workspace --all-targets -- -D warnings.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/cli/tests/coverage/server_tests.rs` around lines 15 - 19, The test
gateway_forwards_claude_startup_probe_without_llm_observability currently
registers a global subscriber via register_subscriber(...) but only calls
deregister_subscriber(...) at the end, which can leak the subscriber on panic;
introduce a small RAII guard (e.g., a struct SubscriberGuard that stores the
subscriber_name and calls deregister_subscriber(subscriber_name) in Drop) and
instantiate it immediately after register_subscriber(...) so deregistration
always runs even if the test panics; after making this change, run just
test-rust, cargo fmt --all, and cargo clippy --workspace --all-targets -- -D
warnings to validate formatting and lints.

Sources: Coding guidelines, Learnings

1750-1837: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Make subscriber cleanup panic-safe to avoid cross-test contamination.

register_subscriber mutates global state; if any assertion fails before Line 1836, the subscriber remains registered and can taint other concurrently running tests in this binary.

Suggested fix

+struct SubscriberCleanup(&'static str);
+
+impl Drop for SubscriberCleanup {
+    fn drop(&mut self) {
+        let _ = deregister_subscriber(self.0);
+    }
+}
+
 #[tokio::test]
 async fn gateway_forwards_claude_startup_probe_without_llm_observability() {
     let subscriber_name = "server-claude-startup-probe-no-llm-test";
     let _ = deregister_subscriber(subscriber_name);
@@
     register_subscriber(
         subscriber_name,
         Arc::new(move |event| {
@@
         }),
     )
     .unwrap();
+    let _subscriber_cleanup = SubscriberCleanup(subscriber_name);
@@
-    deregister_subscriber(subscriber_name).unwrap();
 }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/cli/tests/coverage/server_tests.rs` around lines 1750 - 1837, The test
registers a global subscriber with register_subscriber and currently calls
deregister_subscriber only at the end, which leaves the subscriber registered if
the test panics; make cleanup panic-safe by creating an RAII guard that
deregisters in Drop (e.g., implement a small SubscriberGuard that calls
deregister_subscriber(subscriber_name) in its Drop) and use it immediately after
register_subscriber (or wrap the register call in a helper that returns the
guard); reference register_subscriber, deregister_subscriber and the new
SubscriberGuard (or similar) so the subscriber is always removed even on panic.

willkill07

Conditionally approving

mnajafian-nv

Conditional approval upon addressing the two contract-test concerns

Signed-off-by: Yuchen Zhang <yuchenz@nvidia.com>

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/cli/tests/coverage/session_tests.rs`:
- Around line 1627-1662: The test currently collapses spans by name into
attributes_by_span using HashMap which hides duplicates and doesn't verify
parent-child lineage; update the test that uses decode_otlp_spans to collect
spans as a Vec (or group by name into Vecs) instead of a single HashMap, assert
there is exactly one "codex-turn", one "openai.responses" and one "Read" span,
then read the "nemo_relay.uuid" from the codex-turn span and assert that each
child span's "nemo_relay.parent_uuid" equals that UUID while also asserting each
child has its own "nemo_relay.uuid"; reference decode_otlp_spans,
attributes_by_span (replace usage), and the span name keys "codex-turn",
"openai.responses", "Read" plus fields "nemo_relay.uuid" and
"nemo_relay.parent_uuid" when making these checks.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 18379409-75bd-4d16-b620-5e9dcd4d4fd1

📥 Commits

Reviewing files that changed from the base of the PR and between 2e6b588 and 605ad75.

⛔ Files ignored due to path filters (1)

Cargo.lock is excluded by !**/*.lock

📒 Files selected for processing (3)

crates/cli/Cargo.toml
crates/cli/tests/coverage/server_tests.rs
crates/cli/tests/coverage/session_tests.rs

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Check / Run
GitHub Check: Preview docs

🧰 Additional context used

📓 Path-based instructions (15)

**/{Cargo.toml,**/*.rs}

📄 CodeRabbit inference engine (.agents/skills/maintain-packaging/SKILL.md)

Maintain consistency between Rust package names in Cargo.toml and their actual usage across the codebase

Files:

crates/cli/Cargo.toml
crates/cli/tests/coverage/server_tests.rs
crates/cli/tests/coverage/session_tests.rs

**/*.{rs,toml}

📄 CodeRabbit inference engine (.agents/skills/rename-surfaces/SKILL.md)

Update Rust crate names and module prefixes during coordinated rename operations

Files:

crates/cli/Cargo.toml
crates/cli/tests/coverage/server_tests.rs
crates/cli/tests/coverage/session_tests.rs

**/*.{py,txt,toml,cfg,yaml,yml}

📄 CodeRabbit inference engine (.agents/skills/rename-surfaces/SKILL.md)

Update Python package names and top-level module imports during coordinated rename operations

Files:

crates/cli/Cargo.toml

**/Cargo.toml

📄 CodeRabbit inference engine (.agents/skills/rename-surfaces/SKILL.md)

Update WebAssembly crate names and generated package names during coordinated rename operations

Confirm or infer the target release version from upstream/main:Cargo.toml. Derive the release branch as release/<major>.<minor>.

Files:

crates/cli/Cargo.toml

{docs/**,README.md,**/Cargo.toml,**/package.json,**/*.md}

📄 CodeRabbit inference engine (.agents/skills/validate-change/SKILL.md)

Ensure renamed public surfaces are reflected consistently in manifests and docs for large or public-facing changes

Files:

crates/cli/Cargo.toml

**/*.{md,mdx,py,sh,yaml,yml,toml,json}

📄 CodeRabbit inference engine (.agents/skills/contribute-docs/SKILL.md)

Keep package names, repo references, and build commands current

Files:

crates/cli/Cargo.toml

**/*.toml

📄 CodeRabbit inference engine (CONTRIBUTING.md)

Include SPDX license header in TOML configuration files using hash comment syntax

Files:

crates/cli/Cargo.toml

**/*.{rs,py,js,ts,tsx,jsx,go,sh,toml,yaml,yml,md}

📄 CodeRabbit inference engine (AGENTS.md)

Keep SPDX headers on source, docs, scripts, and configuration files. The project is Apache-2.0.

Files:

crates/cli/Cargo.toml
crates/cli/tests/coverage/server_tests.rs
crates/cli/tests/coverage/session_tests.rs

**

⚙️ CodeRabbit configuration file

**:

AGENTS.md

This file provides guidance to agents, including Claude Code and OpenAI Codex, when working in this repository.

Project Overview

NeMo Relay is a multi-language agent runtime framework for execution scopes, lifecycle events, middleware, plugins, and observability around tool and LLM calls. The core runtime is Rust. Primary supported bindings are Rust, Python, and Node.js. Go, WebAssembly, and the raw C FFI are experimental and source-first.

The shared runtime model is:

Scope stacks decide where work belongs and which scope-local behavior is visible.

Middleware registries decide what guardrails and intercepts run around managed calls.

Plugins install reusable runtime behavior from configuration.

Events record runtime behavior in ATOF form.

Subscribers and exporters consume events in-process or export them to ATIF, OpenTelemetry, OpenInference, or other backends.

Repository Structure

The repository layout separates the Rust runtime, language bindings, documentation,
integration patches, and agent-facing skills.
crates/
  core/       # Rust core runtime crate, published as nemo-relay
  adaptive/   # Adaptive runtime primitives and plugin components
  python/     # PyO3 native extension for the Python package
  ffi/        # Raw C ABI layer used by downstream bindings such as Go
  node/       # NAPI Node.js binding and JavaScript/TypeScript entry points
  wasm/       # wasm-bindgen WebAssembly binding and JS wrappers
python/
  nemo_relay/  # Python wrapper package: scopes, tools, LLM, middleware, typed helpers, plugins, adaptive helpers
  tests/      # Python tests
go/
  nemo_relay/  # Experimental Go CGo binding and tests
fern/         # Fern documentation site
scripts/      # Stable wrappers and helper scripts; build/test/docs entry points live in justfile
third_party/  # P...

Files:

crates/cli/Cargo.toml
crates/cli/tests/coverage/server_tests.rs
crates/cli/tests/coverage/session_tests.rs

**/*.rs

📄 CodeRabbit inference engine (.agents/skills/add-binding-feature/SKILL.md)

Use snake_case naming convention for Rust identifiers (e.g., nemo_relay_tool_call)

**/*.rs: Any Rust change must run just test-rust
Any Rust change must run cargo fmt --all
Any Rust change must run cargo clippy --workspace --all-targets -- -D warnings

**/*.rs: Run cargo fmt --all for all FFI work since it is Rust work
Run just test-rust to validate FFI changes
Run cargo clippy --workspace --all-targets -- -D warnings to enforce strict linting on FFI work

When Rust files changed as part of Go work, also run cargo fmt --all, just test-rust, and cargo clippy --workspace --all-targets -- -D warnings

**/*.rs: Run cargo fmt --all when Rust files are changed as part of Node work
Run cargo clippy --workspace --all-targets -- -D warnings when Rust files are changed as part of Node work
Run just test-rust when Rust files are changed as part of Node work

**/*.rs: Run cargo fmt --all to format all Rust code
Run cargo clippy --workspace --all-targets -- -D warnings to enforce all clippy lints as errors

**/*.rs: Run cargo fmt --all when Rust files changed as part of WebAssembly work
Run cargo clippy --workspace --all-targets -- -D warnings when Rust files changed as part of WebAssembly work

**/*.rs: If any Rust code changed, always run just test-rust
If any Rust code changed, also run cargo fmt --all
If any Rust code changed, also run cargo clippy --workspace --all-targets -- -D warnings
Run Rust formatting with cargo fmt --all
Run Rust linting with cargo clippy --workspace --all-targets -- -D warnings

**/*.rs: Use cargo fmt for Rust code formatting
Run cargo clippy -- -D warnings to lint Rust code and treat all warnings as errors
Use Rust snake_case naming convention for Rust identifiers
Include SPDX license header in all Rust source files using double-slash comment syntax
Validate Rust code with uv run pre-commit run --all-files to enforce cargo fmt formatting check, cargo clippy lints, and cargo deny aud...

Files:

crates/cli/tests/coverage/server_tests.rs
crates/cli/tests/coverage/session_tests.rs

{crates/adaptive/**/*.rs,**/*test*.{rs,py,go,ts,js},**/*adaptive*test*.{rs,py,go,ts,js},docs/plugins/adaptive/**}

📄 CodeRabbit inference engine (.agents/skills/maintain-optimizer/SKILL.md)

Maintain documented and tested validation and report behavior for adaptive surfaces

Files:

crates/cli/tests/coverage/server_tests.rs
crates/cli/tests/coverage/session_tests.rs

**/*.{h,hpp,c,cpp,rs}

📄 CodeRabbit inference engine (.agents/skills/maintain-packaging/SKILL.md)

Ensure FFI header and library naming follows consistent conventions across platform-specific builds

Files:

crates/cli/tests/coverage/server_tests.rs
crates/cli/tests/coverage/session_tests.rs

**/*.{rs,py,go,js,ts,tsx}

📄 CodeRabbit inference engine (AGENTS.md)

Follow binding naming conventions: Rust and Python use snake_case, C FFI exports prefixed nemo_relay_, Go uses PascalCase for public APIs, Node.js uses camelCase.

Files:

crates/cli/tests/coverage/server_tests.rs
crates/cli/tests/coverage/session_tests.rs

crates/**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

crates/**/*.rs: Keep async behavior on the existing tokio-based model. Bindings should preserve callback and future lifetimes rather than blocking or hiding async work unexpectedly.
Use Json = serde_json::Value in Rust-facing runtime APIs for JSON payload handling.

Files:

crates/cli/tests/coverage/server_tests.rs
crates/cli/tests/coverage/session_tests.rs

{crates/**/tests/**,python/tests/**,go/nemo_relay/**/*_test.go}

⚙️ CodeRabbit configuration file

{crates/**/tests/**,python/tests/**,go/nemo_relay/**/*_test.go}: Tests should cover the behavior promised by the changed API surface, including error paths and cross-request isolation where relevant.
Prefer assertions on lifecycle events, scope stacks, middleware ordering, and binding parity over shallow smoke tests.

Files:

crates/cli/tests/coverage/server_tests.rs
crates/cli/tests/coverage/session_tests.rs

🔇 Additional comments (5)

crates/cli/tests/coverage/server_tests.rs (4)

46-52: LGTM!

118-138: LGTM!

1109-1239: LGTM!

1800-1800: LGTM!

crates/cli/Cargo.toml (1)

53-54: Run the repo-required Rust validation suite before merge.

The PR summary only mentions cargo fmt --check and targeted cargo test -p nemo-relay-cli ... runs. For a Rust change in this repo, please also run the canonical just test-rust and cargo clippy --workspace --all-targets -- -D warnings; the repo guidance also prefers uv run pre-commit run --all-files for final validation so the new OTLP dev-dependencies are exercised under the same checks CI expects.

As per coding guidelines, "Any Rust change must run just test-rust", "Any Rust change must run cargo fmt --all", "Any Rust change must run cargo clippy --workspace --all-targets -- -D warnings", and "Validate Rust code with uv run pre-commit run --all-files to enforce cargo fmt formatting check, cargo clippy lints, and cargo deny auditing."

Source: Coding guidelines

Signed-off-by: Yuchen Zhang <yuchenz@nvidia.com>

willkill07 · 2026-06-06T01:18:35Z

/merge

yczhang-nv added 3 commits June 5, 2026 15:37

yczhang-nv self-assigned this Jun 5, 2026

yczhang-nv requested a review from a team as a code owner June 5, 2026 22:43

yczhang-nv added the Improvement improvement to existing functionality label Jun 5, 2026

yczhang-nv changed the title ~~Codex/relay 177 codex observability~~ test: add Codex observability contract coverage Jun 5, 2026

github-actions Bot added size:M PR is medium Test Test related lang:rust PR changes/introduces Rust code and removed Improvement improvement to existing functionality labels Jun 5, 2026

coderabbitai Bot reviewed Jun 5, 2026

View reviewed changes

Comment thread crates/cli/tests/coverage/session_tests.rs Outdated

Merge remote-tracking branch 'upstream/main' into codex/relay-177-cod…

2e6b588

…ex-observability # Conflicts: # crates/cli/tests/coverage/server_tests.rs

coderabbitai Bot reviewed Jun 5, 2026

View reviewed changes

willkill07 previously approved these changes Jun 5, 2026

View reviewed changes

Comment thread crates/cli/tests/coverage/server_tests.rs Outdated

mnajafian-nv reviewed Jun 5, 2026

View reviewed changes

Comment thread crates/cli/tests/coverage/session_tests.rs Outdated

mnajafian-nv reviewed Jun 5, 2026

View reviewed changes

Comment thread crates/cli/tests/coverage/server_tests.rs Outdated

mnajafian-nv reviewed Jun 5, 2026

View reviewed changes

Comment thread crates/cli/tests/coverage/session_tests.rs Outdated

mnajafian-nv previously approved these changes Jun 5, 2026

View reviewed changes

willkill07 added this to the 0.4 milestone Jun 5, 2026

test(cli): address observability review feedback

605ad75

Signed-off-by: Yuchen Zhang <yuchenz@nvidia.com>

yczhang-nv dismissed stale reviews from mnajafian-nv and willkill07 via 605ad75 June 6, 2026 00:16

yczhang-nv requested a review from a team as a code owner June 6, 2026 00:16

github-actions Bot added size:L PR is large and removed size:M PR is medium labels Jun 6, 2026

copy-pr-bot Bot had a problem deploying to fern June 6, 2026 00:16 Failure

coderabbitai Bot reviewed Jun 6, 2026

View reviewed changes

Comment thread crates/cli/tests/coverage/session_tests.rs Outdated

Merge upstream/main into codex/relay-177-codex-observability

f1ecfa9

Signed-off-by: Yuchen Zhang <yuchenz@nvidia.com>

github-actions Bot added size:M PR is medium and removed size:L PR is large labels Jun 6, 2026

willkill07 approved these changes Jun 6, 2026

View reviewed changes

rapids-bot Bot merged commit fc9eb81 into NVIDIA:main Jun 6, 2026
31 checks passed

willkill07 removed the request for review from a team June 6, 2026 01:18

rapids-bot Bot temporarily deployed to fern June 6, 2026 01:18 Inactive

yczhang-nv deleted the codex/relay-177-codex-observability branch June 6, 2026 01:36

Conversation

yczhang-nv commented Jun 5, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Details

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

AGENTS.md

Project Overview

Repository Structure

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

AGENTS.md

Project Overview

Repository Structure

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

AGENTS.md

Project Overview

Repository Structure

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

willkill07 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mnajafian-nv left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

AGENTS.md

Project Overview

Repository Structure

Uh oh!

Uh oh!

willkill07 commented Jun 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

yczhang-nv commented Jun 5, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 5, 2026 •

edited

Loading