OpenTelemetry + SigNoz for Rig Agents (Gemini): Practical tutorial from first run to production rigor

This guide is written for teams starting with observability in LLM systems.
We move from ideas to implementation at an accessible pace, then layer in the operational rigor needed for production.

It is based on the working examples in this repo:

examples/otel.rs
examples/gemini_rig_basic.rs
examples/gemini_rig_tools.rs
examples/gemini_multi_agent.rs

You will learn how to:

Think clearly about what to measure in LLM workflows.
Design spans that answer real engineering questions.
Run examples and verify traces in SigNoz.
Iterate when observability is incomplete.

0) The mindset shift: logs are not enough

In LLM systems, logs tell you what was written; tracing tells you why it happened in order.

Example:

You see a log saying "tool returned: 100".
Without a trace, you do not know:
- Did model planning time 5x normal?
- Was the tool invoked from planner or from writer?
- Which user request this belongs to under load?

Observability starts with a trace tree.

1) What problem this solves (in plain language)

For each user request, you want to know:

Which steps were taken.
Which step took the longest.
Which model and tool was used where.
Where errors entered the chain.
Which request this run belongs to.

If your output is flat (all spans at one level), you are blind to the real control flow.

1.1) Fast start for this tutorial

What you need before you begin

Rust and Cargo (rustup, cargo; Rust 1.85+ recommended).
Docker installed and running (docker command available).
nc (netcat) available.
A valid GEMINI_API_KEY for Gemini examples.
OTLP destination (SigNoz Cloud endpoint + ingestion key, or a local collector on OTLP ports).
Network access for provider calls.

Where to look first

From the rust-llm-observability-guide folder:

Core guide: README.md
Telemetry setup: examples/otel.rs
Smoke example: examples/otel_smoke.rs
Gemini examples:
examples/gemini_rig_basic.rs, examples/gemini_rig_tools.rs, examples/gemini_multi_agent.rs
Automation scripts:
scripts/run-otel-smoke-check.sh, scripts/run-otel-rig-examples.sh

Recommended first run sequence

Read setup and sample shape examples in this order:
- examples/otel.rs
- examples/otel_smoke.rs
- examples/gemini_rig_basic.rs
Run the smoke check:

cd rust-llm-observability-guide
./scripts/run-otel-smoke-check.sh

Run all runnable examples:

cd rust-llm-observability-guide
./scripts/run-otel-rig-examples.sh

If an example is skipped, the script prints a clear skip reason and continues.

Practical checks to expect

otel_smoke_probe: found in collector output
collector_receives: true
marker_match: true

License and support

This tutorial is under MIT License and is fully open source.
If something is unclear or not working, you can ask and I will try to clarify. Community help is welcome.

2) OpenTelemetry vocabulary for practical start-up (important)

2.1 Trace

A Trace is one complete journey (for one request).
Think of it as one “story” from start to finish.

2.2 Span

A Span is one chapter in the story.
Each chapter has:

name
start/end time
parent/child relationship
status (success/error)

2.3 Context

Each span has IDs:

trace_id: story identifier
span_id: chapter identifier
parent_id: who started this chapter

2.4 Resource

A Resource describes the application that emits telemetry (for example, service name).

2.5 Attributes and events

Attributes = searchable metadata (low-cardinality, stable values).
Events = timeline notes and explanatory lines inside a span.

3) Trace language map for this project

We will reuse these concrete names:

Root request: rig_gemini_basic_prompt / rig_gemini_with_tool / rig_gemini_multi_agent
Tool child span: tool.add_numbers
Planner/writer phases: agent_orchestrator, agent_writer

When you read a trace, the question is:
“Do these names map to my intended flow diagram?”

3.1 Vendor choice note: this stack is a pattern, not a lock-in

This tutorial uses Gemini and SigNoz as examples, but the architecture is provider/backend agnostic.

You can swap rig::providers::gemini for another Rig provider without redesigning your tracing.
You can swap model IDs (gemini-2.5-flash, gemini-2.5-pro) for your chosen provider.
You can keep the same span strategy and OTLP pipeline while changing the backend.
You can point OTEL_EXPORTER_OTLP_ENDPOINT to any OpenTelemetry-compatible collector.

Practical alternatives:

LLM providers: OpenAI, Anthropic, Azure OpenAI, or any provider with Rig integration.
OTel backends: SigNoz, Jaeger, Tempo, Zipkin, New Relic, Datadog (via OpenTelemetry endpoints).

Only two parts usually change during migration:

provider/model configuration at agent construction
OTLP environment variables

Everything else—trace structure, span hierarchy, and debugging workflow—remains the same.

4) Before coding: make a prediction (this is the most useful first-step habit)

Before running anything, write down expected spans.

Example: `gemini_multi_agent` expected shape

rig_gemini_multi_agent (request)
└─ agent_orchestrator
   ├─ planner.prompt  (conceptual planning step)
   └─ agent_writer
      └─ writer.prompt (rewrite step)

If your observed trace does not match this shape, do not optimize latency yet. Fix instrumentation first.

5) Dependencies

Use the same versions used by the working examples:

[dependencies]
anyhow = "1"
opentelemetry = { version = "0.30.0", features = ["trace"] }
opentelemetry_sdk = { version = "0.30.0", features = ["trace", "rt-tokio"] }
opentelemetry-otlp = { version = "0.30.0", features = ["grpc-tonic", "trace", "tls-roots"] }
rig = { package = "rig-core", version = "0.31.0" }
serde = { version = "1", features = ["derive"] }
serde_json = "1"
tracing = "0.1"
tracing-opentelemetry = "0.31.0"
tracing-subscriber = { version = "0.3", features = ["env-filter", "fmt"] }
tokio = { version = "1", features = ["full"] }
tokio-util = "0.7"
tonic = { version = "0.12" }

6) Initialize telemetry once (copy this pattern first)

examples/otel.rs centralizes everything:

use anyhow::Context;
use opentelemetry::global;
use opentelemetry::KeyValue;
use opentelemetry_sdk::trace::SdkTracerProvider;
use opentelemetry_sdk::Resource;
use opentelemetry::trace::TracerProvider as TracerProviderTrait;
use tracing_subscriber::{EnvFilter, fmt, layer::SubscriberExt, util::SubscriberInitExt};

pub fn init_telemetry(service_name: &str) -> anyhow::Result<SdkTracerProvider> {
    let endpoint = std::env::var("OTEL_EXPORTER_OTLP_ENDPOINT")
        .unwrap_or_else(|_| "http://localhost:4317".to_string());

    let exporter = opentelemetry_otlp::SpanExporter::builder()
        .with_tonic()
        .with_endpoint(endpoint)
        .build()
        .context("Failed to create OTLP span exporter")?;

    let tracer_provider = SdkTracerProvider::builder()
        .with_batch_exporter(exporter)
        .with_resource(
            Resource::builder()
                .with_service_name(service_name.to_owned())
                .with_attribute(KeyValue::new("telemetry.sdk.language", "rust"))
                .build(),
        )
        .build();

    global::set_tracer_provider(tracer_provider.clone());

    let tracer = TracerProviderTrait::tracer(&tracer_provider, "rig-gemini-tracer");
    let otel_layer = tracing_opentelemetry::layer().with_tracer(tracer);
    let filter_layer = EnvFilter::try_from_default_env().unwrap_or_else(|_| EnvFilter::new("info"));

    tracing_subscriber::registry()
        .with(filter_layer)
        .with(fmt::layer().with_target(false))
        .with(otel_layer)
        .init();

    Ok(tracer_provider)
}

pub fn has_gemini_api_key() -> bool {
    std::env::var("GEMINI_API_KEY").is_ok()
}

Why this design is reliable for first-pass implementations

One setup function means no confusion.
Same environment variable flow for local and cloud setups.
Global provider means all examples share one model.

In production, this saves hours of “why one span appears and another is missing”.

Call this in each main before work begins.

7) Example A: one request, one model (`gemini_rig_basic.rs`)

use rig::telemetry::SpanCombinator;

#[tracing::instrument(name = "rig_gemini_basic_prompt")]
async fn run_prompt() -> anyhow::Result<String> {
    let client = rig::providers::gemini::Client::from_env();
    let prompt_text = "Explain OpenTelemetry in exactly 3 bullets for a Rust backend engineer.";
    let prompt_span = tracing::info_span!(
        "agent.prompt",
        model = "gemini-2.5-flash",
        stage = "planner"
    );
    let _prompt_guard = prompt_span.enter();
    prompt_span.record_model_input(&serde_json::json!({
        "prompt": prompt_text,
    }));

    let agent = client
        .agent("gemini-2.5-flash")
        .preamble("You are a concise technical assistant. Answer clearly and with short bullets.")
        .temperature(0.2)
        .build();

    tracing::info!(model = "gemini-2.5-flash", "Sending prompt to Gemini");

    let answer = agent
        .prompt(prompt_text)
        .await
        .context("Gemini prompt failed")?;

    prompt_span.record_model_output(&serde_json::json!({
        "response_len": answer.len(),
        "response_preview": answer.chars().take(120).collect::<String>(),
    }));

    tracing::info!(response_len = answer.len(), "Received response");
    Ok(answer)
}

What to observe

Root span = request
SpanCombinator records model input/output for the planner span
output includes response length and short preview
Good first pattern to confirm your pipeline works

8) Example B: add a tool call (`gemini_rig_tools.rs`)

use rig::telemetry::SpanCombinator;

#[tracing::instrument(name = "rig_gemini_with_tool")]
async fn run_tool_agent() -> anyhow::Result<String> {
    let client = rig::providers::gemini::Client::from_env();
    let prompt = "Use the add_numbers tool to compute 42 + 58";
    let tool_span = tracing::info_span!(
        "agent.planner",
        model = "gemini-2.5-flash",
        role = "planner"
    );
    let _tool_guard = tool_span.enter();
    tool_span.record_model_input(&serde_json::json!({
        "task": "Use add_numbers tool for arithmetic",
        "prompt": prompt,
    }));

    let agent = client
        .agent("gemini-2.5-flash")
        .preamble(
            "You are a calculator assistant. Use the `add_numbers` tool whenever the user asks for arithmetic.",
        )
        .tool(AddTool)
        .build();

    let answer = agent
        .prompt(prompt)
        .await
        .context("Gemini tool-enabled prompt failed")?;

    tool_span.record_model_output(&serde_json::json!({
        "response_len": answer.len(),
        "response_preview": answer.chars().take(120).collect::<String>(),
    }));

    Ok(answer)
}

Tool call span inside AddTool::call:

let span = tracing::info_span!("tool.add_numbers", x = args.x, y = args.y);
span.record_model_input(&args);
let _guard = span.enter();
tracing::info!("Executing math tool");
let result = args.x + args.y;
span.record_model_output(&serde_json::json!({
    "result": result,
}));
Ok(result)

Why this matters in the first instrumentation pass

Without this explicit child span, tool work disappears behind a model span and you cannot answer:

Did tool execution or model thinking dominate latency?
Was tool error correctly linked to the original request?

9) Example C: two-stage orchestration (`gemini_multi_agent.rs`)

use rig::telemetry::SpanCombinator;

#[tracing::instrument(name = "rig_gemini_multi_agent")]
async fn run_orchestration(topic: &str) -> anyhow::Result<String> {
    let orchestrator = tracing::info_span!("agent_orchestrator", task = topic);
    let _orchestrator_guard = orchestrator.enter();
    orchestrator.record_model_input(&serde_json::json!({
        "topic": topic,
        "workflow": "planner_then_writer",
    }));

    let client = rig::providers::gemini::Client::from_env();

    let planner = client
        .agent("gemini-2.5-pro")
        .preamble("You are a planning assistant. Produce a structured plan first, then a 1-line summary.")
        .temperature(0.2)
        .build();

    let planner_prompt = format!("Create a practical rollout plan for this topic: {topic}");
    let planner_span = tracing::info_span!(
        "agent.planner",
        model = "gemini-2.5-pro",
        agent_role = "planner",
        task = topic
    );
    let _planner_guard = planner_span.enter();
    planner_span.record_model_input(&serde_json::json!({
        "prompt": planner_prompt,
    }));

    tracing::info!(agent = "planner", "Running planner step");
    let plan = planner
        .prompt(planner_prompt)
        .await
        .context("Planner step failed")?;
    planner_span.record_model_output(&serde_json::json!({
        "plan_len": plan.len(),
        "plan_preview": plan.chars().take(180).collect::<String>(),
    }));

    let writer = client
        .agent("gemini-2.5-flash")
        .preamble("You are a concise writer. Return a short executive version of the plan.")
        .max_tokens(700)
        .build();

    let writer_span = tracing::info_span!("agent_writer");
    let _writer_guard = writer_span.enter();
    let writer_prompt = format!("Summarize this plan into 5 short bullet points:\n\n{plan}");
    writer_span.record_model_input(&serde_json::json!({
        "model": "gemini-2.5-flash",
        "prompt": writer_prompt,
    }));

    tracing::info!(agent = "writer", "Running rewrite step");
    let summary = writer
        .prompt(writer_prompt)
        .await
        .context("Writer step failed")?;
    writer_span.record_model_output(&serde_json::json!({
        "response_len": summary.len(),
        "response_preview": summary.chars().take(180).collect::<String>(),
    }));

    orchestrator.record_model_output(&serde_json::json!({
        "plan_len": plan.len(),
        "summary_len": summary.len(),
    }));

    Ok(format!("Plan:\n{plan}\n\nExecutive summary:\n{summary}"))
}

Early-stage learning checkpoint

This example proves that one user request can have two model contexts and still stay in one trace tree.

10) Runbook (copy and run)

Configure environment

Cloud (direct SigNoz)

export OTEL_EXPORTER_OTLP_ENDPOINT="https://<your-region>.ingest.signoz.cloud:443"
export OTEL_EXPORTER_OTLP_HEADERS='signoz-ingestion-key='$SIGNOZ_INGESTION_KEY
export OTEL_EXPORTER_OTLP_COMPRESSION=gzip
export GEMINI_API_KEY="your_gemini_key"

Local collector

export OTEL_EXPORTER_OTLP_ENDPOINT="http://localhost:4317"
unset OTEL_EXPORTER_OTLP_HEADERS
export GEMINI_API_KEY="your_gemini_key"

Execute examples

cargo run --example gemini_rig_basic
cargo run --example gemini_rig_tools
cargo run --example gemini_multi_agent

If GEMINI_API_KEY is missing, examples exit with clear guidance without making network calls.

11) Read the trace like a first-pass review

For each run:

Open the trace for the run.
Confirm exactly one root span.
Expand the tree:
- Are planner/writer/tool nodes where expected?
Compare durations:
- Which node is the slowest?
Check status:
- Any non-OK error status? Expand details.

If any step fails this check, do not tune sampling yet. Fix instrumentation gaps first.

11.1 Automated collector smoke check

Use this script for a repeatable collector verification:

cd rust-llm-observability-guide
./scripts/run-otel-smoke-check.sh

The script:

starts a temporary OpenTelemetry Collector on safe local ports,
emits a tiny known telemetry span from the Rust path,
forces shutdown so span batches flush,
validates that collector output contains otel_smoke_probe for that run,
prints collector output, the example output, and a clear pass/fail summary.
helps learners inspect exactly what reached the collector and confirm span shape.

Useful overrides:

OTEL_SMOKE_MARKER=my-ci-run ./scripts/run-otel-smoke-check.sh
OTEL_SMOKE_GRPC_PORT=14417 ./scripts/run-otel-smoke-check.sh
OTEL_SMOKE_SERVICE=rig-smoke ./scripts/run-otel-smoke-check.sh

What you should see when successful:

collector_receives: true
otel_smoke_probe: found in collector output
a ResourceSpans block in collector logs containing otel_smoke_probe
an attribute line where marker equals your OTEL_SMOKE_MARKER

If the collector endpoint is not set, the script skips smoke and prints a clear skip reason.

11.2 Run all runnable examples (scripted)

Run the tutorial examples from one command:

cd rust-llm-observability-guide
./scripts/run-otel-rig-examples.sh

Script behavior:

always runs otel_smoke so telemetry collection is exercised end-to-end,
runs each Gemini example and prints compact run output,
skips Gemini examples automatically if GEMINI_API_KEY is missing (with a clear reason),
prints a final Summary line with PASS, FAIL, and SKIP counts.

Use it after the smoke test script to quickly confirm both code paths:

telemetry path (otel_smoke)
agent behavior (gemini_rig_basic, gemini_rig_tools, gemini_multi_agent)

Recent script status in this repository:

Summary: PASS=4  FAIL=0  SKIP=0
All runnable examples completed.

If you do not have a collector endpoint set, you should observe:

Summary: PASS=3  FAIL=0  SKIP=1

12) Common first-pass mistakes and corrections

Mistake 1: Flat traces only

Cause: no span hierarchy around phase changes.
Fix: place #[tracing::instrument] and explicit child spans around planner/tool paths.

Mistake 2: Full prompts as span names

Cause: every request creates a unique span name.
Fix: keep names stable and use events for narrative.

Mistake 3: Recreating telemetry providers per request

Cause: inconsistent context, missing parent links.
Fix: initialize once at startup.

Mistake 4: Missing `service.name`

Cause: dashboards cannot group by binary/service correctly.
Fix: set resource consistently in init_telemetry.

Mistake 5: Instrumenting too much without purpose

Cause: noisy signals, expensive storage.
Fix: only span what helps debugging and latency decomposition.

13) Practical exercises (do these first)

Exercise 1

Predict the span shape for gemini_rig_tools on paper, then run it.

Exercise 2

Add one extra tracing::info_span! around prompt construction. Validate it appears as a child span.

Exercise 3

Intentionally break OTEL_EXPORTER_OTLP_ENDPOINT then fix it. Observe missing traces and recovery behavior.

Keep one notebook with:

what changed
what trace shape changed
what you learned

14) Concept + syntax + design-pattern deepening

This section connects your current examples to reusable architecture patterns so you can build the next workflow faster.

14.1 Pattern: make control-flow visible first

An LLM request often fails where most people look first: around orchestration.

Trace = one request.
Spans = steps.
Parent-child tree = order + causality.

If you can draw the tree on paper, you can usually find missing instrumentation quickly.

14.2 Pattern: telemetry bootstrap module (singleton + lifetime anchor)

Your init_telemetry(...) is the same idea:

Setup once at startup.
Keep provider alive for app lifetime.
Flush on shutdown.

Why it matters for beginners:

global provider lets Rig and your own spans use one tracing plane.
missing this step causes half the request to run without telemetry.

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    let provider = otel::init_telemetry("rig-gemini")?;
    // ... run app ...
    provider.shutdown()?;
    Ok(())
}

14.3 Pattern: layered subscriber composition

You compose behavior instead of hardcoding one output:

EnvFilter = runtime filtering.
fmt layer = local text logs.
tracing_opentelemetry layer = export to OTel.

That composition is what lets you add new outputs (JSON logs, metrics, test subscribers) without rewriting your spans.

14.4 Pattern: workflow span taxonomy (names vs attributes vs events)

For agent systems, this is the highest leverage rule:

Span names = stable operation names (agent.planner, tool.add_numbers).
Attributes = searchable facts (model, agent_role, tool_name).
Events = explanatory notes (prompt selected, tool returned 100).

Rule for readability:

If you need to group many traces by it, use attributes.
If it is narrative, keep it in events.

14.5 Pattern: context continuity (async-safe parentage)

In async Rust, you can accidentally break context and get floating spans.

Use explicit span scoping when needed:

let span = tracing::info_span!("agent.planner", model = "gemini-2.5-pro");
let span_guard = span.enter();
// run planner work with that current span context

Your current examples rely on the library-provided context for most operations; this pattern becomes important when you create manually spawned tasks.

14.6 Pattern: export topology choices

You currently can export in two common ways:

App -> SigNoz Cloud OTLP directly.
App -> OpenTelemetry Collector -> SigNoz backend.

OTLP ports you should remember:

4317 for gRPC
4318 for HTTP

Use environment variables to switch destinations without code changes.

14.7 Pattern: resource identity vs request identity

Use:

service/resource attributes for service-level identity (service.name, environment metadata),
span attributes/events for request-level details.

14.8 Pattern: telemetry hygiene and prompt safety

Never use unbounded prompt text in high-cardinality fields. Prefer lengths/hashes/redacted snippets unless you have a strict policy.

Why this matters:

better dashboard performance,
lower storage cost,
less risk of leaking sensitive content.

14.9 High-signal target trace shape (what to aim for)

For one multi-agent request, a practical ideal is:

rig.request
├─ agent.planner (custom)
│  └─ gen_ai.chat (provider/model call)
├─ tool.add_numbers (custom)
└─ agent.writer (custom)
   └─ gen_ai.chat (provider/model call)

That shape is useful because every major step becomes queryable and comparable.

Level-up goal:

Keep your orchestration spans (agent.planner, agent.writer) in stable names.
Keep model spans aligned to GenAI semantic fields where possible (model/provider/operation).
Enforce prompt/PII hygiene in one place (prefer collector processors when possible).

14.10 Using `SpanCombinator` for custom spans

rig::telemetry::SpanCombinator is useful when you want your own spans to carry the same telemetry language as Rig’s internal model spans.

In this tutorial repo, we now use it in:

examples/gemini_rig_basic.rs (agent.prompt span records model input/output)
examples/gemini_rig_tools.rs (tool execution span records add/sub input/output equivalents and planner span captures call context)
examples/gemini_multi_agent.rs (planner/writer stages record request and response context)

This gives you a parent-child structure where:

Rig’s provider call still writes the standard LLM fields (gen_ai.input.messages, gen_ai.output.messages, gen_ai.usage.*) when available.
Your orchestration spans add stable business semantics (agent.planner, tool.add_numbers, agent.writer) that are easy to filter in SigNoz.

Quick usage pattern:

use rig::telemetry::SpanCombinator;

let span = tracing::info_span!("agent.planner", model = "gemini-2.5-flash");
let _guard = span.enter();

span.record_model_input(&serde_json::json!({
    "prompt": "Create a rollout plan for tracing in an agent pipeline.",
}));

let response = "planner result ...";
span.record_model_output(&serde_json::json!({
    "response_len": response.len(),
    "response_preview": response.chars().take(120).collect::<String>(),
}));

You can optionally add:

span.record_token_usage(&usage) when you have a response object with token metadata (GetTokenUsage),
span.record_response_metadata(&response) when you have provider response metadata (ProviderResponseExt).

For this reason, this pattern is most helpful in your orchestration stages and tool calls while leaving model-provider call semantics to Rig’s internal instrumentation.

14.11 Staged path: onboarding depth vs production rigor (same architecture)

Use this comparison to avoid overengineering when learning, then harden safely.

Focus	Onboarding profile	Production rigor
Initialization	`init_telemetry(...)` in each example, run once at startup	Shared initializer used by every binary/service with env-based overrides
Span strategy	One root span + one or two child spans	Full phase taxonomy: request, planner, tool, writer, parse, render
Attributes	`model`, `agent`, `topic`, `tool_name`	Add standardized keys (`gen_ai.provider.name`, `gen_ai.operation.name`, `gen_ai.request.model`) and request metadata
Error handling	Print/log error and stop	Capture status/error in spans; include retry counters and provider status
Data safety	Avoid high-cardinality names	Redact prompts in app or Collector processor; include only lengths/hashes unless explicitly allowed
Export	One OTLP endpoint, default batching	Tuned batching, compression, timeouts, and collector-based buffering/retry
Shutdown	Optional or manual	Graceful shutdown hook always calls provider `shutdown()`
Verification	“Do I see a trace?”	“Does the trace shape match expected flow and latency budget?”

Starter template (smallest guaranteed working shape)

#[tracing::instrument(name = "request")]
async fn run_request() -> anyhow::Result<String> {
    let client = rig::providers::gemini::Client::from_env();
    let agent = client
        .agent("gemini-2.5-flash")
        .preamble("Be concise and technical.")
        .build();

    tracing::info!(model = "gemini-2.5-flash", "starting request");
    let answer = agent.prompt("Explain this in 3 bullets.").await?;
    tracing::info!(response_len = answer.len(), "received response");
    Ok(answer)
}

Production template (same flow, production-grade durability)

#[tracing::instrument(
    name = "rig.request",
    fields(
        gen_ai_operation_name = "chat.completion",
        gen_ai_provider_name = "gemini",
        gen_ai_request_model = "gemini-2.5-flash",
        topic = %topic,
        request_id = %uuid::Uuid::new_v4(),
    )
)]
async fn run_request(topic: &str) -> anyhow::Result<String> {
    let span = tracing::info_span!(
        "agent.planner",
        attempt = 0u32,
        gen_ai_operation_name = "chat.completion",
        model = "gemini-2.5-flash",
    );
    let _guard = span.enter();

    tracing::info!(event = "planner_started", step = "generate_plan");
    let client = rig::providers::gemini::Client::from_env();
    let planner = client
        .agent("gemini-2.5-pro")
        .preamble("You are a planner. Return a structured plan.")
        .build();

    let plan = planner
        .prompt(format!("Plan for topic: {topic}"))
        .await
        .context("planner failed")?;

    tracing::info!(event = "planner_completed", plan_len = plan.len());

    let writer_span = tracing::info_span!("agent.writer", attempt = 0u32);
    let _writer_guard = writer_span.enter();
    let writer = client
        .agent("gemini-2.5-flash")
        .preamble("You are a concise writer.")
        .build();

    let final_text = writer
        .prompt(format!("Summarize plan: {plan}"))
        .await
        .context("writer failed")?;

    tracing::info!(event = "request_completed", output_len = final_text.len());
    Ok(final_text)
}

What to switch first when moving from early implementation to production

Add request IDs in span fields.
Add one stable span per phase (agent.planner, agent.writer, tool.*).
Add structured error fields (error_type, error_stage, retry counters).
Move secret/prompt policies to collector configuration.
Add graceful shutdown and validate shutdown() is always called.
Review trace shape in SigNoz after each change.

15) Quick reference checklist

one stable root span per request
one span per meaningful phase
stable span names, low-cardinality attributes
events for detail, not names
explicit startup/shutdown lifecycle
validate by drawing and matching trace tree

16) External reading list (official and practical)

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
examples		examples
scripts		scripts
src		src
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation