-
Notifications
You must be signed in to change notification settings - Fork 9
feat: implement rich error handling with retry policies and E2E testing framework #191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This adds a comprehensive error system for component SDK: Error Types (Transient - Retryable): - NetworkError: DNS failures, connection timeouts - RateLimitError: HTTP 429, API quota exceeded - ServiceError: HTTP 5xx, service unavailable - TimeoutError: Operation timeouts - ResourceUnavailableError: Connection pool/queue exhaustion Error Types (Permanent - Non-Retryable): - AuthenticationError: HTTP 401/403, invalid credentials - NotFoundError: HTTP 404, missing resources - ValidationError: Invalid input, missing fields - ConfigurationError: Invalid settings - PermissionError: Access denied - ContainerError: Docker image issues Helpers: - fromHttpResponse(): Auto-classify HTTP errors - NetworkError.from(): Wrap Node.js errors - RateLimitError.fromHeaders(): Parse rate limit headers - isRetryableError(), getRetryDelayMs(), wrapError() Also adds ComponentRetryPolicy interface for per-component retry configuration with Temporal integration support. Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
Tier 1 component migration to use standardized error types: notification.slack: - ConfigurationError for missing webhookUrl/slackToken - AuthenticationError for invalid_auth/token_revoked - fromHttpResponse() for webhook HTTP failures - Added retry policy: 5 attempts, 2s initial, 60s max core.http-request: - TimeoutError for request timeouts (AbortError) - NetworkError for connection failures - fromHttpResponse() for HTTP error responses - Added retry policy: 3 attempts, 1s initial, 30s max Both components now throw semantically typed errors that enable intelligent retry decisions by the Temporal activity handler. Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
- ConfigurationError when storage service is not available - Added retry policy: 3 attempts, 1s initial, 10s max - Quick retries for transient I/O issues Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
Tier 2 security component migration to standardized errors: security.virustotal: - ValidationError for missing indicator - ConfigurationError for missing API key - fromHttpResponse() for API failures - Retry policy: 4 attempts, 2s initial, 120s max security.dnsx: - ContainerError for non-docker runner check - Retry policy: 3 attempts, 1s initial, 10s max security.subfinder: - ContainerError for non-docker runner check - Retry policy: 2 attempts, 5s initial, 30s max (conservative for expensive ops) security.nuclei: - ValidationError for template validation failures - ServiceError for scan execution failures - Retry policy: 2 attempts, 10s initial (expensive scans) All security components now throw typed errors with appropriate retry policies based on their operational characteristics. Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
Tier 3 AI component migration to standardized errors: core.provider.openai: - ConfigurationError for missing API key - Retry policy: 1 attempt (no retry for config validation) core.provider.gemini: - ConfigurationError for missing API key - Retry policy: 1 attempt (no retry for config validation) core.provider.openrouter: - ConfigurationError for missing API key - Retry policy: 1 attempt (no retry for config validation) core.ai.generate-text: - ConfigurationError for missing API key - Retry policy: 3 attempts, 2s initial, 30s max (handles transient API errors with exponential backoff) Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
- Replace generic Error throws with typed errors: - ConfigurationError for missing API key - ValidationError for input/schema issues with full context - fromHttpResponse for MCP request failures - Add retry policy (3 attempts, 2s-30s exponential backoff) - Preserve original error context and AI response snippets for debugging Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
- manual-selection: ValidationError for empty options - okta-user-offboard: NotFoundError, ServiceError, ConfigurationError - google-workspace-user-delete: NotFoundError, ServiceError, ConfigurationError - github-remove-org-membership: full error taxonomy including AuthenticationError, PermissionError, NetworkError, TimeoutError, NotFoundError, and fromHttpResponse helper Added retry policies to all components with appropriate backoff settings and non-retryable error types configured per component requirements. Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
Migrated components: - trufflehog: ContainerError, ValidationError, retry policy (2 attempts) - shuffledns-massdns: ContainerError, retry policy (2 attempts) - httpx: ServiceError for exit codes, retry policy (2 attempts) - notify: ConfigurationError for missing provider config - atlassian-offboarding: ValidationError, ConfigurationError, NetworkError - nuclei: Complete typed error migration for template validation All components now use the standardized error taxonomy with appropriate retry policies and non-retryable error types. Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
Security components: - amass: ServiceError, ValidationError, retry policy - supabase-scanner: ValidationError, retry policy - prowler-scan: ConfigurationError, ValidationError, ServiceError, retry policy - nuclei: ValidationError for template validation Core components: - array-pick: ValidationError for out-of-bounds - secret-fetch: ConfigurationError, NotFoundError, ValidationError - file-writer: ValidationError for missing content Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
- entry-point: ValidationError for missing required inputs - artifact-writer: ConfigurationError for missing artifact service - text-splitter: ValidationError for missing file content - text-joiner: ValidationError for missing file content Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
- ContainerError for Docker runner fallback scenario - ValidationError for failed JSON parsing of script results Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
Error categorization improvements: - TimeoutError: Container timeouts with timeout duration in ms - ContainerError: Docker spawn failures, non-zero exit codes, PTY failures - ValidationError: Failed to serialize input params to JSON - ConfigurationError: Unsupported runner type All errors include contextual details like: - Docker command arguments (formatted for readability) - Exit codes and stderr output - Original cause errors for debugging Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
Core execution pipeline: - run-component.activity.ts: NotFoundError for missing components, ValidationError for missing inputs - workflow-runner.ts: NotFoundError for missing actions/components - human-input.activity.ts: ConfigurationError for missing database - run-dispatcher.activity.ts: ConfigurationError for missing env, ServiceError for API failures Adapters: - secrets.adapter.ts: ServiceError for decryption failures - artifact.adapter.ts: NotFoundError for missing artifacts/files - file-storage.adapter.ts: NotFoundError for missing files - kafka-*.adapter.ts: ConfigurationError for missing brokers - loki-log.adapter.ts: ServiceError for push failures Utilities: - isolated-volume.ts: ValidationError for invalid inputs, ConfigurationError for state errors, ContainerError for Docker failures Destinations: - registry.ts: ConfigurationError for duplicate adapters, NotFoundError for missing adapters - s3.ts: ConfigurationError for missing bucket/credentials Component SDK: - registry.ts: ConfigurationError for duplicate components - contracts.ts: ConfigurationError for duplicate contracts - ports.ts: ValidationError for invalid credential names Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
Remaining worker files: - destinations/adapters/artifact.ts: ConfigurationError for missing artifact service - temporal/workers/dev.worker.ts: ConfigurationError for missing env vars - components/test/docker-echo.ts: ContainerError for inline fallback - temporal/workflows/index.ts: ApplicationFailure with error types for Temporal sandbox Notes: - Temporal workflow code uses ApplicationFailure instead of component-sdk errors due to sandbox import restrictions - All error types now include contextual details for debugging Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
Propagate detailed error information from worker to frontend: - Updated TraceEvent in component-sdk and shared packages to support structured error objects - Enriched NODE_FAILED events with error types (ValidationError, NotFoundError, etc.), stacks, and details - Migrated backend workflow_traces.error column to jsonb for flexible event storage - Implemented ExecutionErrorView on frontend for beautiful, category-aware error rendering - Integrated structured error display into WorkflowNode and EventInspector components Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
- Add E2E test suite using Bun's built-in test runner - Create tsconfig.e2e.json for proper TypeScript configuration - Implement error-handling.test.ts with 4 test scenarios: - Permanent Service Error (validates max retry policy) - Retryable Success (validates exponential backoff) - Validation Error (validates non-retryable behavior) - Timeout Error (validates timeout error structure) - Add justfile with test, cleanup, and helper commands - Add cleanup.ts to remove test workflows with "Test:" prefix - Add comprehensive README.md with test documentation - Update test-error-generator component with alwaysFail parameter - Fix WorkflowNode error badge to show expandable arrow - Register test-error-generator in worker component index E2E tests require local dev environment (backend + worker running). Tests run via "just test" or "bun test --config tsconfig.e2e.json". Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| error: { | ||
| message: errorMsg, | ||
| type: (error as any)?.type || (error instanceof Error ? error.name : 'UnknownError'), | ||
| stack: error instanceof Error ? error.stack : undefined, | ||
| details: (error as any)?.details, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Preserve ValidationError field errors in trace payloads
The trace event only serializes details from the thrown error (details: (error as any)?.details), but ValidationError stores field-level information in error.fieldErrors (e.g., core.workflow.entrypoint throws a ValidationError with only fieldErrors). As a result, those field errors are dropped from trace records and the new ExecutionErrorView cannot show them. Consider merging fieldErrors into the trace payload (e.g., details: { ...error.details, fieldErrors: error.fieldErrors }) or adding a dedicated field on TraceError so ValidationError context isn’t lost.
Useful? React with 👍 / 👎.
| id: 'core.http.request', | ||
| label: 'HTTP Request', | ||
| category: 'transform', | ||
| runner: { kind: 'inline' }, | ||
| retryPolicy: httpRequestRetryPolicy, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wire component retryPolicy into activity scheduling
Components now declare retryPolicy (e.g., HTTP Request sets maxAttempts/backoff), but the workflow/worker path never reads ComponentDefinition.retryPolicy when scheduling activities (proxyActivities only sets timeouts, and runComponentActivity ignores it). This means per-component retry limits and nonRetryableErrorTypes have no effect despite being configured. Consider applying the component’s retry policy when invoking runComponentActivity or translating it into Temporal activity options so the configuration is honored.
Useful? React with 👍 / 👎.
- Add RUN_E2E environment variable check in e2e test file - Configure bunfig.toml to exclude e2e-tests from coverage - Add test:e2e script to package.json for running e2e tests explicitly - Fix http-request and atlassian-offboarding test assertions to match actual error messages Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
269d699 to
f3a6179
Compare
- Add dedicated fieldErrors field to TraceErrorSchema and TraceEvent interface - Extract fieldErrors from ValidationError instances without using 'any' types - Update frontend to read from dedicated fieldErrors field instead of details.fieldErrors - Update backend trace service and OpenAPI schema to handle fieldErrors - Ensure consistency across activity and workflow-runner error handling paths Fixes the issue where ValidationError field-level information was being dropped from trace records, preventing ExecutionErrorView from displaying field-specific validation errors. Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
- Add synchronous service availability check before tests are defined - Use test.skip conditionally based on service availability - Tests now officially skip (not pass) when RUN_E2E is set but services are down - Fix TypeScript errors with timeout option in test.skip wrapper - Include required x-internal-token header in curl health check Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
Summary
Comprehensive refactor of the error handling system with rich, structured error representation, retry policies, and a new E2E testing framework using Bun's test runner.
Changes
Error Handling Refactor
Rich Error Types (
feat/sdk)):ServiceError,NetworkError,RateLimitError,TimeoutError,AuthenticationError,NotFoundError,ValidationError,ConfigurationErrordetailsobject for context-specific dataValidationError(e.g.,fieldErrors.api_key: [...])Retry Policies:
maxAttempts: 3, exponential backoff (coefficient 2.0)ValidationError,AuthenticationError,NotFoundError,ConfigurationErrorretryPolicyin component definitionerrorTypePoliciesmap for fine-grained controlTyped Error Migration (75 files, ~3000 insertions):
logic-script,entry-point,file-writer, etc.subfinder,nuclei,httpx,virustotal, etc.ai-agent, openai/gemini/openrouter providers, LLM componentsokta-user-offboard,google-workspace-license-unassignrun-component,run-dispatcher,human-inputs3,artifactstorage adaptersE2E Testing Framework
Bun Test Runner Integration (
feat(e2e)):tsconfig.e2e.json- Separate TypeScript config for E2E testse2e-tests/error-handling.test.ts- 4 test scenarios:e2e-tests/justfile- Commands:test,cleanup,list,check-services,logse2e-tests/cleanup.ts- Removes workflows with "Test:" prefixe2e-tests/README.md- Comprehensive documentation with examplesTest Component:
test.error.generatorcomponent for E2E testingmode,errorType,errorMessage,errorDetails,failUntilAttempt,alwaysFailUI Improvements
Error Display:
ExecutionErrorViewTesting
Breaking Changes
None - Error types are backwards compatible and components return structured errors.
Checklist
Notes
E2E tests require local dev environment running:
pm2 start pm2.config.cjs just testSee
e2e-tests/README.mdfor full documentation.