Skip to content

feat(exceptions): Layer-1 structured exception hierarchy with NR-* error codes#31

Merged
maltsev-dev merged 1 commit into
masterfrom
feat/layer-1-exception-hierarchy
Jun 24, 2026
Merged

feat(exceptions): Layer-1 structured exception hierarchy with NR-* error codes#31
maltsev-dev merged 1 commit into
masterfrom
feat/layer-1-exception-hierarchy

Conversation

@maltsev-dev

Copy link
Copy Markdown
Member

Summary

Introduces a Layer-1 structured exception hierarchy: every public
SDK exception now inherits from NullRunError and carries four
actionable fields (error_code, user_action, retryable, docs_url)
plus an optional chained cause. Users get a stable, grep-able code
(e.g. NR-A001, NR-B002, NR-R001) and a short imperative
next-step hint instead of a free-form message string.

What changes

New base classNullRunError(BreakerError) with the four
structured fields and the __init__ per-instance override pattern
so subclass defaults don't leak across siblings.

New specialized classes (each is a subclass of the existing
user-facing class it refines, so existing except clauses keep
matching):

New class Subclass of error_code retryable
NullRunConfigError NullRunError NR-C001 False
NullRunAuthError NullRunAuthenticationError NR-A001 False
NullRunBackendError NullRunTransportError NR-B002 True
NullRunBudgetError NullRunBlockedException NR-X001 False
NullRunToolBlockedError NullRunBlockedException NR-T001 False

Runtime / Decorators / Transport — raise the new typed
exceptions where the code path used to raise plain strings or the
generic base. Transport continues to map the gateway 5xx envelope
to NullRunBackendError so the retryable hint propagates cleanly.

Public re-exportssrc/nullrun/__init__.py exposes the new
classes so cookbook examples and external code can except NullRunBudgetError directly.

Back-compat invariants (pinned by tests)

  • except NullRunAuthenticationError still catches NullRunAuthError
  • except NullRunBlockedException still catches NullRunBudgetError
    and NullRunToolBlockedError
  • except NullRunTransportError still catches NullRunBackendError
    and RateLimitError
  • except WorkflowKilledException still catches
    WorkflowKilledInterrupt (BaseException inheritance preserved)
  • except Exception does not catch WorkflowKilledInterrupt

Tests

New file: tests/test_exception_hierarchy.py (258 lines) covers:

  • Class roots — every public exception inherits from NullRunError
  • Structured fields — every class has the four fields populated
    with a non-empty user_action and a stable error_code matching
    the NR-LETTERNNN pattern
  • Back-compat invariants A–E listed above

CI status (local verification on Windows / Python 3.14.2)

Step Result
pytest 880 passed, 13 skipped (0:09:55)
ruff check src/ All checks passed
mypy src/ No issues found in 24 source files

Note: ruff format --check for the whole repo fails on 79 files,
but this is preexisting on master (verified with git stash
the same 79 files fail before this change) and is not part of the
CI workflow. A separate repo-wide format cleanup can land in a
dedicated commit.

Files

  • src/nullrun/__init__.py — re-exports
  • src/nullrun/breaker/exceptions.py — new base + 5 new classes
  • src/nullrun/decorators.py — typed raises
  • src/nullrun/runtime.py — typed raises
  • src/nullrun/transport.py — typed raises + retryable mapping
  • tests/test_exception_hierarchy.py — new

Every public SDK exception now inherits from NullRunError and carries
four actionable fields (error_code, user_action, retryable, docs_url)
plus an optional chained cause. Users get a stable, grep-able error
code (NR-A001, NR-B002, NR-R001, ...) and a short imperative
next-step hint instead of a free-form message string.

New specialized classes (back-compat subclasses of existing
user-facing classes, so existing except clauses keep matching):

  * NullRunConfigError       — config/initialization failures
  * NullRunAuthError         — invalid/missing API key (subclass of
                               NullRunAuthenticationError)
  * NullRunBackendError      — gateway 5xx (subclass of
                               NullRunTransportError, retryable=True)
  * NullRunBudgetError       — budget exhausted (subclass of
                               NullRunBlockedException)
  * NullRunToolBlockedError  — tool blocked by policy (subclass of
                               NullRunBlockedException)

Existing except handlers keep working: every new class is a subclass
of an existing one, so e.g. 'except NullRunBlockedException' still
catches NullRunBudgetError and NullRunToolBlockedError.

Tests: tests/test_exception_hierarchy.py pins the hierarchy shape
(class roots), the structured fields on every public class, and the
five back-compat invariants (subclass matching for the user-facing
exception trees, BaseException isolation for WorkflowKilledInterrupt).

Verified locally: pytest 880 passed / 13 skipped, ruff check src/
clean, mypy src/ clean.
@codecov

codecov Bot commented Jun 23, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 94.38202% with 5 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/nullrun/runtime.py 25.00% 3 Missing ⚠️
src/nullrun/breaker/exceptions.py 97.36% 0 Missing and 2 partials ⚠️

📢 Thoughts on this report? Let us know!

@maltsev-dev maltsev-dev merged commit 3d03c55 into master Jun 24, 2026
5 checks passed
@maltsev-dev maltsev-dev deleted the feat/layer-1-exception-hierarchy branch June 24, 2026 09:29
maltsev-dev pushed a commit that referenced this pull request Jun 24, 2026
…ection

Builds on the Layer-1 structured exception hierarchy (PR #31).
Three deliverables in this commit:

1) nullrun.observability package
   - error_hooks.py: global hook registry with thread-safe
     register / unregister / dispatch. Multiple hooks fire in
     registration order. Hook exceptions are caught and logged
     at DEBUG — a misbehaving hook cannot break the SDK.
     has_hooks() short-circuit keeps the hot path zero-cost
     when nothing is registered.
   - status.py: NullRunStatus dataclass (frozen) + RecentError
     ring buffer (capacity 10) + WorkflowState enum. State
     derivation covers four headline buckets: ok / degraded /
     offline / misconfigured. Per-instance state queries never
     mutate the runtime.
   - observability.py is renamed into the package (__init__.py
     keeps the previous public surface).

2) nullrun public API additions
   - on_error(hook) — Layer 2 entry point. Documented as
     'give the user a chance' to observe every structured
     failure before it propagates. Skipped for
     WorkflowKilledInterrupt (BaseException subclass) — kill
     is a signal, not an error.
   - status() — Layer 3 entry point. Returns a frozen
     NullRunStatus snapshot. Raises NullRunConfigError (NR-C004)
     if no runtime has been init()'d. Never lazily creates a
     runtime as a side effect (pinned by
     test_status_never_lazily_creates_runtime).
   - Both are added to __all__ so they appear in dir(nullrun)
     for discoverability.

3) Docs: docs/errors/
   - 15 per-code pages (NR-A001..A003, B001..B005, C001/C003,
     L001, R001, T001, W002/W003) plus README index. Each page
     documents the error_code, the trigger conditions, the
     user_action, and the retryable hint.
   - docs/integration-baseline-2026-06-19.md — pinned baseline
     for the next integration run.

4) Test updates
   - test_error_hooks.py — registry + dispatch + bypass tests
     (killed interrupt does not fire; one bad hook does not
     prevent later hooks; unregister is idempotent).
   - test_status.py — no-runtime / with-runtime / state
     derivation / recent-errors ring buffer.
   - test_integration_contract.py — track_event setdefault
     race pinned against the locked helper.
   - test_dead_code_removed.py::test_dir_size_unchanged —
     now keys off nullrun.__all__ (the source of truth for the
     curated surface) so the curated-surface contract is
     pinned without hardcoding the symbol count.

5) Source wiring
   - runtime.py — _emit_sdk_error / _emit_for_transport_error
     wire the new error_hooks.emit_error into the two SDK
     failure paths. status() builder reads runtime state and
     feeds the recent-errors ring buffer.
   - transport.py — failed batches emit
     NullRunBackendError (retryable=True) through the new path
     so retries surface the correlation_id in the
     ErrorContext.
   - decorators.py — @Protect catches the structured
     NullRunBlockedException family and emits with stage='tool'
     so a hook can attribute the failure to the right gate.

Verified locally on Windows / Python 3.14.2:
  pytest        926 passed, 13 skipped
  ruff check    clean on src/ and tests/
  mypy src/     clean on 26 source files
maltsev-dev added a commit that referenced this pull request Jun 24, 2026
…ection (#32)

Builds on the Layer-1 structured exception hierarchy (PR #31).
Three deliverables in this commit:

1) nullrun.observability package
   - error_hooks.py: global hook registry with thread-safe
     register / unregister / dispatch. Multiple hooks fire in
     registration order. Hook exceptions are caught and logged
     at DEBUG — a misbehaving hook cannot break the SDK.
     has_hooks() short-circuit keeps the hot path zero-cost
     when nothing is registered.
   - status.py: NullRunStatus dataclass (frozen) + RecentError
     ring buffer (capacity 10) + WorkflowState enum. State
     derivation covers four headline buckets: ok / degraded /
     offline / misconfigured. Per-instance state queries never
     mutate the runtime.
   - observability.py is renamed into the package (__init__.py
     keeps the previous public surface).

2) nullrun public API additions
   - on_error(hook) — Layer 2 entry point. Documented as
     'give the user a chance' to observe every structured
     failure before it propagates. Skipped for
     WorkflowKilledInterrupt (BaseException subclass) — kill
     is a signal, not an error.
   - status() — Layer 3 entry point. Returns a frozen
     NullRunStatus snapshot. Raises NullRunConfigError (NR-C004)
     if no runtime has been init()'d. Never lazily creates a
     runtime as a side effect (pinned by
     test_status_never_lazily_creates_runtime).
   - Both are added to __all__ so they appear in dir(nullrun)
     for discoverability.

3) Docs: docs/errors/
   - 15 per-code pages (NR-A001..A003, B001..B005, C001/C003,
     L001, R001, T001, W002/W003) plus README index. Each page
     documents the error_code, the trigger conditions, the
     user_action, and the retryable hint.
   - docs/integration-baseline-2026-06-19.md — pinned baseline
     for the next integration run.

4) Test updates
   - test_error_hooks.py — registry + dispatch + bypass tests
     (killed interrupt does not fire; one bad hook does not
     prevent later hooks; unregister is idempotent).
   - test_status.py — no-runtime / with-runtime / state
     derivation / recent-errors ring buffer.
   - test_integration_contract.py — track_event setdefault
     race pinned against the locked helper.
   - test_dead_code_removed.py::test_dir_size_unchanged —
     now keys off nullrun.__all__ (the source of truth for the
     curated surface) so the curated-surface contract is
     pinned without hardcoding the symbol count.

5) Source wiring
   - runtime.py — _emit_sdk_error / _emit_for_transport_error
     wire the new error_hooks.emit_error into the two SDK
     failure paths. status() builder reads runtime state and
     feeds the recent-errors ring buffer.
   - transport.py — failed batches emit
     NullRunBackendError (retryable=True) through the new path
     so retries surface the correlation_id in the
     ErrorContext.
   - decorators.py — @Protect catches the structured
     NullRunBlockedException family and emits with stage='tool'
     so a hook can attribute the failure to the right gate.

Verified locally on Windows / Python 3.14.2:
  pytest        926 passed, 13 skipped
  ruff check    clean on src/ and tests/
  mypy src/     clean on 26 source files

Co-authored-by: Anatolii <anatolii@nullrun.io>
maltsev-dev added a commit that referenced this pull request Jun 24, 2026
Bump version 0.6.0 → 0.6.1. This release lands all three layers
of the 'give the user a chance' design on top of the 0.6.0 P0
hardening pass:

  * Layer 1 — structured exception hierarchy. Every public SDK
    exception inherits from NullRunError and carries
    error_code / user_action / retryable / docs_url / cause.
    Five new typed classes (NullRunConfigError, NullRunAuthError,
    NullRunBackendError, NullRunBudgetError, NullRunToolBlockedError)
    are subclasses of the existing user-facing classes, so every
    'except' clause from 0.6.0 keeps matching.

  * Layer 2 — nullrun.on_error() global error hook. Fires for
    every structured NullRunError before the exception
    propagates. Skipped for WorkflowKilledInterrupt (BaseException
    subclass — kill is a signal, not an error). Multiple hooks
    fire in registration order; hook exceptions are caught and
    logged at DEBUG. has_hooks() short-circuit keeps the hot
    path zero-cost when no hook is registered.

  * Layer 3 — nullrun.status() introspection. Synchronous,
    thread-safe, side-effect-free snapshot of runtime state.
    Returns a frozen NullRunStatus dataclass with one of four
    headline states (ok / degraded / offline / misconfigured).
    Raises NullRunConfigError (NR-C004) if no runtime has been
    init()'d — never lazily creates a runtime as a side effect.

Per-code docs in docs/errors/ (15 pages + README index).
New tests pin the hierarchy, the hook semantics, the snapshot
fields, and the recent-errors ring buffer.

TestPyPI: the previous 0.6.0 (uploaded 2026-06-23, before
#31 and #32 landed) is yanked separately so the new 0.6.1
wheel can be uploaded. The yank is a TestPyPI-side action;
it does not change the source tree.
maltsev-dev added a commit that referenced this pull request Jun 24, 2026
Bump version 0.6.0 → 0.6.1. This release lands all three layers
of the 'give the user a chance' design on top of the 0.6.0 P0
hardening pass:

  * Layer 1 — structured exception hierarchy. Every public SDK
    exception inherits from NullRunError and carries
    error_code / user_action / retryable / docs_url / cause.
    Five new typed classes (NullRunConfigError, NullRunAuthError,
    NullRunBackendError, NullRunBudgetError, NullRunToolBlockedError)
    are subclasses of the existing user-facing classes, so every
    'except' clause from 0.6.0 keeps matching.

  * Layer 2 — nullrun.on_error() global error hook. Fires for
    every structured NullRunError before the exception
    propagates. Skipped for WorkflowKilledInterrupt (BaseException
    subclass — kill is a signal, not an error). Multiple hooks
    fire in registration order; hook exceptions are caught and
    logged at DEBUG. has_hooks() short-circuit keeps the hot
    path zero-cost when no hook is registered.

  * Layer 3 — nullrun.status() introspection. Synchronous,
    thread-safe, side-effect-free snapshot of runtime state.
    Returns a frozen NullRunStatus dataclass with one of four
    headline states (ok / degraded / offline / misconfigured).
    Raises NullRunConfigError (NR-C004) if no runtime has been
    init()'d — never lazily creates a runtime as a side effect.

Per-code docs in docs/errors/ (15 pages + README index).
New tests pin the hierarchy, the hook semantics, the snapshot
fields, and the recent-errors ring buffer.

TestPyPI: the previous 0.6.0 (uploaded 2026-06-23, before
#31 and #32 landed) is yanked separately so the new 0.6.1
wheel can be uploaded. The yank is a TestPyPI-side action;
it does not change the source tree.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant