feat: OpenTelemetry instrumentation (closes #34)#36
Open
SokratisVidros wants to merge 20 commits into
Open
Conversation
Records the v1 design decisions for adding OTel tracing support: a first-party plugin shipped from pg-workflows with an optional peer dep on @opentelemetry/api, a new wrap hook on WorkflowPlugin, per-execution span lifetime, and cache-hit suppression for replayed steps. Metrics, cross- execution context propagation, and DLQ spans are explicitly deferred. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Step-by-step TDD plan for the design committed in be52240. 15 bite-sized tasks covering: package wiring, plugin interface extension, engine wrap chain, OTel plugin (workflow.run + step.* spans, cache-hit suppression, error path), tests with InMemorySpanExporter, README and AGENTS docs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Build a wrap chain from each plugin's optional wrap field in reverse registration order so that the first-registered plugin is outermost. Add a TDD test asserting the exact before/after call order. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Capture startTime before awaiting step.run so spans reflect actual step execution time instead of near-zero post-completion duration. - Save originalErr and re-throw it (not the coerced Error), matching the wrap hook pattern and preserving non-Error throw values. - Add test asserting step.run span duration >= 30ms for a 50ms handler. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds a span for step.invokeChildWorkflow in the OTel plugin, emitting exactly one span per invocation (first execution only) by detecting both the cached-output case and the binding-key-only case (parent paused but child not yet complete) as cache hits on resume. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…tion step.sleep was not wrapped by the OTel plugin because spreading baseStep copies the getter's value, not the getter itself — so sleep pointed to the unwrapped delay. Added sleep to the methods return object, reusing the 'delay' kind for semantic consistency. Added a unit test that verifies step.sleep emits a pg_workflows.step.delay span. Also corrected all snake_case span names in the OTel design spec (wait_for, wait_until, invoke_child_workflow) to camelCase (waitFor, waitUntil, invokeChildWorkflow) to match the implementation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Move the OTel design and plan files out of the repo — they were development-process metadata, not user-facing docs. Their concrete output lives in src/plugins/otel.ts and is exercised by the test suite. Add a public docs/observability.md page covering span hierarchy, attributes, cache-hit semantics, plugin composition, options, error semantics, and explicit v1 deferrals. Wire the page into the README documentation index and fix the design-doc link in the Observability section. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
126bbdd to
b3b3244
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a first-party
otelPluginthat emits OpenTelemetry spans for workflow and step execution.@opentelemetry/apiis an optional peer dependency — users who don't import the plugin pay zero runtime cost.pg_workflows.workflow.runspan per worker execution, with child spans per step kind (step.run,step.waitFor,step.delay,step.waitUntil,step.pause,step.poll,step.invokeChildWorkflow,step.sleepaliased to delay).step.pollexception called out below).recordException+ ERROR status; the original error is re-thrown so engine retry/DLQ behaviour is unchanged.wrap?(context, next)hook onWorkflowPluginlets any plugin compose middleware around the workflow handler — the engine builds the chain in registration order.WorkflowContextgainsresourceId(optional) andattempt(required) so plugins can read them without a DB round-trip.Design and plan
docs/superpowers/specs/2026-05-21-otel-instrumentation-design.mddocs/superpowers/plans/2026-05-21-otel-instrumentation.mdOut of scope for v1 (explicitly deferred)
step.pollcache-hit suppression — every poll execution emits a span (the test-helperfastForwardWorkflowpre-writes output, so a naive cache-hit guard would suppress legitimate spans). Trade-off documented in the design doc.Test Plan
npm run test:unit— 138 passed, 1 skipped, 1 todo (16 new OTel tests across happy paths, error paths, cache-hit replay, span duration, plugin composition, and theisCachedHitpredicate)npm run build— cleannpm run lint— cleanotelPlugin, register aNodeSDK, run a workflow with a step.run + step.waitFor + step.invokeChildWorkflow, inspect traces in your collector of choiceotelPluginis not imported (no@opentelemetry/apiresolution required)🤖 Generated with Claude Code