Skip to content

feat(ENGHLP-1235): add custom OpenTelemetry spans for ingest and NetBox client#546

Open
jajeffries wants to merge 2 commits into
developfrom
feat/ENGHLP-1235-custom-tracing-spans
Open

feat(ENGHLP-1235): add custom OpenTelemetry spans for ingest and NetBox client#546
jajeffries wants to merge 2 commits into
developfrom
feat/ENGHLP-1235-custom-tracing-spans

Conversation

@jajeffries

Copy link
Copy Markdown
Contributor

Summary

Adds the upstream custom spans required by diode-pro#318 for ENGHLP-1235 incident debugging: async ingest stream handling, validation-phase work, and explicit rate-limiter wait time before NetBox HTTP calls.

What changed

  • diode-server/telemetry/tracing.goStartSpan/End helpers and span name constants aligned with diode-pro/docs/observability/tracing.md.
  • reconciler/ingestion_processor.go
    • ingestion.handle_stream_message — parent span with stream_lag, request_id, entity_count
    • ingestion.create_ingestion_logs — validation/hashing phase before bulk DB insert
  • netboxdiodeplugin/client.gorate_limiter.wait span via waitForRateLimit() before BulkPlan, BulkPlanApply, and GetDefaultBranch calls

How tested

  • make fix-lint
  • go test ./reconciler/... ./netboxdiodeplugin/... ./telemetry/...

Follow-up

After merge, bump github.com/netboxlabs/diode/diode-server in diode-pro/server/go.mod (companion PR: https://github.com/netboxlabs/diode-pro/pull/318).

Linear

ENGHLP-1235 — https://linear.app/netboxlabs/issue/ENGHLP-1235/add-tracing-to-help-understand-request-waits

Made with Cursor

…ox client

Instrument Redis stream message handling with queue-lag attributes, wrap
CreateIngestionLogs validation in a parent span, and trace rate-limiter
waits before NetBox bulk-plan HTTP calls.

Co-authored-by: Cursor <cursoragent@cursor.com>
@github-actions

github-actions Bot commented Jun 23, 2026

Copy link
Copy Markdown

Vulnerability Scan: Passed — diode-reconciler

Image: diode-reconciler:scan

No vulnerabilities found.

Commit: 4c59aeb

@github-actions

github-actions Bot commented Jun 23, 2026

Copy link
Copy Markdown

Vulnerability Scan: Passed — diode-auth

Image: diode-auth:scan

Source Library CVE Severity Installed Fixed Title
usr/bin/hydra github.com/docker/docker CVE-2026-34040 🟠 HIGH v28.3.3+incompatible 29.3.1 Moby: Moby: Authorization bypass vulnerability
usr/bin/hydra github.com/docker/docker CVE-2026-33997 🟡 MEDIUM v28.3.3+incompatible 29.3.1 moby: docker: github.com/moby/moby: Moby: Privilege validation bypass during plu
usr/bin/hydra github.com/go-jose/go-jose/v3 CVE-2026-34986 🟠 HIGH v3.0.4 3.0.5 github.com/go-jose/go-jose/v3: github.com/go-jose/go-jose/v4: Go JOSE: Denial of
usr/bin/hydra github.com/jackc/pgx/v5 CVE-2026-33816 🔴 CRITICAL v5.7.5 5.9.0 github.com/jackc/pgx/v5: github.com/jackc/pgx: Memory-safety vulnerability
usr/bin/hydra github.com/jackc/pgx/v5 CVE-2026-41889 ⚪ LOW v5.7.5 5.9.2 github.com/jackc/pgx: golang: pgx: SQL injection via specific SQL query conditio
usr/bin/hydra go.opentelemetry.io/otel CVE-2026-29181 🟠 HIGH v1.40.0 1.41.0 github.com/open-telemetry/opentelemetry-go: OpenTelemetry-Go: Denial of Service
usr/bin/hydra go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp CVE-2026-39882 🟡 MEDIUM v1.37.0 1.43.0 OpenTelemetry-Go is the Go implementation of OpenTelemetry. Prior to 1 ...
usr/bin/hydra go.opentelemetry.io/otel/sdk CVE-2026-39883 🟠 HIGH v1.40.0 1.43.0 github.com/open-telemetry/opentelemetry-go: OpenTelemetry-Go: Arbitrary code exe
usr/bin/hydra golang.org/x/crypto CVE-2026-39827 🟠 HIGH v0.46.0 0.52.0 An authenticated SSH client that repeatedly opened channels which were ...
usr/bin/hydra golang.org/x/crypto CVE-2026-39828 🟠 HIGH v0.46.0 0.52.0 golang.org/x/crypto/ssh: golang.org/x/crypto/ssh: Unauthorized command execution
usr/bin/hydra golang.org/x/crypto CVE-2026-39829 🟠 HIGH v0.46.0 0.52.0 golang.org/x/crypto/ssh: golang.org/x/crypto/ssh: Denial of Service via crafted
usr/bin/hydra golang.org/x/crypto CVE-2026-39830 🟠 HIGH v0.46.0 0.52.0 golang.org/x/crypto/ssh: golang.org/x/crypto/ssh: Denial of Service via resource
usr/bin/hydra golang.org/x/crypto CVE-2026-39835 🟠 HIGH v0.46.0 0.52.0 SSH servers which use CertChecker as a public key callback without set ...
usr/bin/hydra golang.org/x/crypto CVE-2026-42508 🟠 HIGH v0.46.0 0.52.0 golang.org/x/crypto/ssh/knownhosts: golang: golang.org/x/crypto/ssh/knownhosts:
usr/bin/hydra golang.org/x/crypto CVE-2026-46595 🟠 HIGH v0.46.0 0.52.0 golang.org/x/crypto/ssh: golang.org/x/crypto/ssh: Authorization bypass due to sk
usr/bin/hydra golang.org/x/crypto CVE-2026-46597 🟠 HIGH v0.46.0 0.52.0 An incorrectly placed cast from bytes to int allowed for server-side p ...
usr/bin/hydra golang.org/x/crypto CVE-2026-39831 🟡 MEDIUM v0.46.0 0.52.0 The Verify() method for FIDO/U2F security key types (sk-ecdsa-sha2-nis ...
usr/bin/hydra golang.org/x/crypto CVE-2026-39832 🟡 MEDIUM v0.46.0 0.52.0 When adding a key to a remote agent constraint extensions such as rest ...
usr/bin/hydra golang.org/x/crypto CVE-2026-39833 🟡 MEDIUM v0.46.0 0.52.0 The in-memory keyring returned by NewKeyring() silently accepted keys ...
usr/bin/hydra golang.org/x/crypto CVE-2026-39834 🟡 MEDIUM v0.46.0 0.52.0 When writing data larger than 4GB in a single Write call on an SSH cha ...
usr/bin/hydra golang.org/x/crypto CVE-2026-46598 🟡 MEDIUM v0.46.0 0.52.0 golang.org/x/crypto/ssh/agent: golang: golang.org/x/crypto/ssh/agent: Denial of
usr/bin/hydra golang.org/x/net CVE-2026-25680 🟠 HIGH v0.48.0 0.55.0 Parsing arbitrary HTML can consume excessive CPU time, possibly leadin ...
usr/bin/hydra golang.org/x/net CVE-2026-25681 🟠 HIGH v0.48.0 0.55.0 Parsing arbitrary HTML which is then rendered using Render can result ...
usr/bin/hydra golang.org/x/net CVE-2026-27136 🟠 HIGH v0.48.0 0.55.0 Parsing arbitrary HTML which is then rendered using Render can result ...
usr/bin/hydra golang.org/x/net CVE-2026-33814 🟠 HIGH v0.48.0 0.53.0 When processing HTTP/2 SETTINGS frames, transport will enter an infini ...
usr/bin/hydra golang.org/x/net CVE-2026-39821 🟠 HIGH v0.48.0 0.55.0 golang.org/x/net/idna: golang: golang.org/x/net/idna: Privilege escalation via i
usr/bin/hydra golang.org/x/net CVE-2026-42502 🟠 HIGH v0.48.0 0.55.0 Parsing arbitrary HTML which is then rendered using Render can result ...
usr/bin/hydra golang.org/x/net CVE-2026-42506 🟠 HIGH v0.48.0 0.55.0 Parsing arbitrary HTML which is then rendered using Render can result ...
usr/bin/hydra stdlib CVE-2026-25679 🟠 HIGH v1.26.0 1.25.8, 1.26.1 net/url: Incorrect parsing of IPv6 host literals in net/url
usr/bin/hydra stdlib CVE-2026-27137 🟠 HIGH v1.26.0 1.26.1 crypto/x509: Incorrect enforcement of email constraints in crypto/x509
usr/bin/hydra stdlib CVE-2026-27145 🟠 HIGH v1.26.0 1.25.11, 1.26.4 *x509.Certificate).VerifyHostname previously called matchHostnames in ...
usr/bin/hydra stdlib CVE-2026-32280 🟠 HIGH v1.26.0 1.25.9, 1.26.2 crypto/x509: crypto/tls: golang: Go: Denial of Service vulnerability in certific
usr/bin/hydra stdlib CVE-2026-32281 🟠 HIGH v1.26.0 1.25.9, 1.26.2 crypto/x509: golang: Go crypto/x509: Denial of Service via inefficient certifica
usr/bin/hydra stdlib CVE-2026-32283 🟠 HIGH v1.26.0 1.25.9, 1.26.2 crypto/tls: golang: Go crypto/tls: Denial of Service via multiple TLS 1.3 key up
usr/bin/hydra stdlib CVE-2026-33810 🟠 HIGH v1.26.0 1.26.2 crypto/x509: golang: Go crypto/x509: Certificate validation bypass due to incorr
usr/bin/hydra stdlib CVE-2026-33811 🟠 HIGH v1.26.0 1.25.10, 1.26.3 net: golang: Go net package: Denial of Service via long CNAME response in Lookup
usr/bin/hydra stdlib CVE-2026-33814 🟠 HIGH v1.26.0 1.25.10, 1.26.3 When processing HTTP/2 SETTINGS frames, transport will enter an infini ...
usr/bin/hydra stdlib CVE-2026-39820 🟠 HIGH v1.26.0 1.25.10, 1.26.3 Well-crafted inputs reaching ParseAddress, ParseAddressList, and Parse ...
usr/bin/hydra stdlib CVE-2026-39823 🟠 HIGH v1.26.0 1.25.10, 1.26.3 CVE-2026-27142 fixed a vulnerability in which URLs were not correctly ...
usr/bin/hydra stdlib CVE-2026-39825 🟠 HIGH v1.26.0 1.25.10, 1.26.3 ReverseProxy can forward queries containing parameters not visible to ...
usr/bin/hydra stdlib CVE-2026-39836 🟠 HIGH v1.26.0 1.25.10, 1.26.3 ELSA-2026-22112: go-toolset:ol8 security update (IMPORTANT)
usr/bin/hydra stdlib CVE-2026-42499 🟠 HIGH v1.26.0 1.25.10, 1.26.3 Pathological inputs could cause DoS through consumePhrase when parsing ...
usr/bin/hydra stdlib CVE-2026-42504 🟠 HIGH v1.26.0 1.25.11, 1.26.4 Decoding a maliciously-crafted MIME header containing many invalid enc ...
usr/bin/hydra stdlib CVE-2026-27142 🟡 MEDIUM v1.26.0 1.25.8, 1.26.1 html/template: URLs in meta content attribute actions are not escaped in html/te
usr/bin/hydra stdlib CVE-2026-32282 🟡 MEDIUM v1.26.0 1.25.9, 1.26.2 golang: internal/syscall/unix: Root.Chmod can follow symlinks out of the root
usr/bin/hydra stdlib CVE-2026-32288 🟡 MEDIUM v1.26.0 1.25.9, 1.26.2 archive/tar: golang: Go's archive/tar package: Denial of Service via maliciously
usr/bin/hydra stdlib CVE-2026-32289 🟡 MEDIUM v1.26.0 1.25.9, 1.26.2 html/template: golang: html/template: Cross-Site Scripting (XSS) via improper co
usr/bin/hydra stdlib CVE-2026-39826 🟡 MEDIUM v1.26.0 1.25.10, 1.26.3 html/template: golang: html/template: Cross-site scripting due to incorrect scri
usr/bin/hydra stdlib CVE-2026-42507 🟡 MEDIUM v1.26.0 1.25.11, 1.26.4 net/textproto: golang: Golang net/textproto: Misleading error messages via input
usr/bin/hydra stdlib CVE-2026-27138 ⚪ LOW v1.26.0 1.26.1 crypto/x509: Panic in name constraint checking for malformed certificates in cry
usr/bin/hydra stdlib CVE-2026-27139 ⚪ LOW v1.26.0 1.25.8, 1.26.1 os: FileInfo can escape from a Root in golang os module

Commit: 4c59aeb

@github-actions

github-actions Bot commented Jun 23, 2026

Copy link
Copy Markdown

Vulnerability Scan: Passed — diode-ingester

Image: diode-ingester:scan

No vulnerabilities found.

Commit: 4c59aeb

@github-actions

github-actions Bot commented Jun 23, 2026

Copy link
Copy Markdown

Go test coverage

STATUS ELAPSED PACKAGE COVER PASS FAIL SKIP
🟢 PASS 1.52s github.com/netboxlabs/diode/diode-server/auth 44.7% 42 0 0
🟢 PASS 1.03s github.com/netboxlabs/diode/diode-server/auth/cli 0.0% 0 0 0
🟢 PASS 1.02s github.com/netboxlabs/diode/diode-server/authutil 82.8% 5 0 0
🟢 PASS 0.15s github.com/netboxlabs/diode/diode-server/dbstore/postgres 0.0% 0 0 0
🟢 PASS 1.12s github.com/netboxlabs/diode/diode-server/entityhash 79.2% 13 0 0
🟢 PASS 1.11s github.com/netboxlabs/diode/diode-server/entitymatcher 82.8% 97 0 0
🟢 PASS 0.09s github.com/netboxlabs/diode/diode-server/errors 0.0% 0 0 0
🟢 PASS 1.18s github.com/netboxlabs/diode/diode-server/graph 52.0% 81 0 0
🟢 PASS 1.02s github.com/netboxlabs/diode/diode-server/grpckeepalive 100.0% 1 0 0
🟢 PASS 1.46s github.com/netboxlabs/diode/diode-server/ingester 85.4% 66 0 0
🟢 PASS 1.11s github.com/netboxlabs/diode/diode-server/matching 94.1% 66 0 0
🟢 PASS 1.07s github.com/netboxlabs/diode/diode-server/migrator 70.4% 4 0 0
🟢 PASS 3.13s github.com/netboxlabs/diode/diode-server/netboxdiodeplugin 46.5% 23 0 0
🟢 PASS 0.17s github.com/netboxlabs/diode/diode-server/pprof 0.0% 0 0 0
🟢 PASS 5.08s github.com/netboxlabs/diode/diode-server/reconciler 63.0% 94 0 0
🟢 PASS 0.11s github.com/netboxlabs/diode/diode-server/reconciler/changeset 0.0% 0 0 0
🟢 PASS 1.06s github.com/netboxlabs/diode/diode-server/reconciler/differ 49.3% 23 0 0
🟢 PASS 1.02s github.com/netboxlabs/diode/diode-server/server 85.7% 14 0 0
🟢 PASS 1.01s github.com/netboxlabs/diode/diode-server/strcase 100.0% 24 0 0
🟢 PASS 1.02s github.com/netboxlabs/diode/diode-server/telemetry 26.2% 26 0 0
🟢 PASS 1.02s github.com/netboxlabs/diode/diode-server/telemetry/otel 90.2% 25 0 0
🟢 PASS 0.09s github.com/netboxlabs/diode/diode-server/tls 0.0% 0 0 0
🟢 PASS 1.01s github.com/netboxlabs/diode/diode-server/version 100.0% 2 0 0

Total coverage: 52.8%

Upgrade x/net (and transitive x/crypto, x/sys, x/sync, x/text) to
resolve HIGH severity CVEs flagged by the container vulnerability scan.

Co-authored-by: Cursor <cursoragent@cursor.com>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: af16a525bb

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

if err != nil {
errs = append(errs, fmt.Errorf("failed to convert ingestion timestamp: %v", err))
} else {
streamLag := time.Now().Unix() - int64(ingestionTs)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Use nanoseconds when calculating stream lag

In production the ingester writes ingestion_ts as time.Now().UnixNano() (diode-server/ingester/component.go:184), so this calculation mixes seconds from time.Now().Unix() with a nanosecond timestamp. Every normally produced message will report a huge negative stream_lag value instead of queue wait time, making the new span attribute unusable for incident debugging; convert the stored value with time.Unix(0, int64(ingestionTs)) or compare nanoseconds consistently.

Useful? React with 👍 / 👎.


func (c *Client) waitForRateLimit(ctx context.Context) (err error) {
ctx, span := telemetry.StartSpan(ctx, telemetry.SpanRateLimiterWait)
defer telemetry.End(span, err)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Defer rate-limit span errors with a closure

Because arguments to a deferred call are evaluated immediately, err is still nil when this defer is registered. If c.limiter.Wait(ctx) returns an error because the request context is canceled or times out while waiting, the client correctly returns that error but the new rate_limiter.wait span is ended as successful, hiding exactly the wait failures this span is meant to expose; wrap the call in defer func() { telemetry.End(span, err) }() as done in the ingestion spans.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants