Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions docs/errors/NR-A001.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# NR-A001 — `/auth/verify` returned non-200 (other than 401)

| Field | Value |
|---|---|
| **Code** | `NR-A001` |
| **Category** | Authentication |
| **Exception class** | `NullRunAuthenticationError` |
| **Retryable** | No |

The auth endpoint returned a 4xx/5xx other than 401. Check the
status code in the exception message; if 5xx, this is actually
a backend issue (see `NR-B002`) and may be transient.
14 changes: 14 additions & 0 deletions docs/errors/NR-A002.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# NR-A002 — `/auth/verify` response missing `organization_id`

| Field | Value |
|---|---|
| **Code** | `NR-A002` |
| **Category** | Authentication |
| **Exception class** | `NullRunAuthenticationError` |
| **Retryable** | No |

The auth endpoint returned 200 but the response body has no
`organization_id` field. The SDK refuses to operate in "legacy
identity" mode (no fallback to a default org). Update the
backend, or downgrade the SDK to a version compatible with
the deployed backend.
59 changes: 59 additions & 0 deletions docs/errors/NR-A003.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
# NR-A003 — API key rejected (401)

| Field | Value |
|---|---|
| **Code** | `NR-A003` |
| **Category** | Authentication |
| **Exception class** | `NullRunAuthError` (subclass of `NullRunAuthenticationError`) |
| **Retryable** | No |
| **Default `user_action`** | "The API key was rejected by the NullRun backend (401). Verify the key at https://app.nullrun.io/settings/api-keys and rotate it if it has been revoked." |

## When

Any HTTP call to the NullRun backend returned `401 Unauthorized`. The
API key is no longer valid (revoked, deleted, or for a different
environment).

## Common causes

1. **Key was revoked** in the dashboard.
2. **Key was deleted** but the SDK is still using it (e.g. cached
in `NULLRUN_API_KEY` env var).
3. **Wrong environment** — using a production key against
`NULLRUN_API_URL=https://api.staging.nullrun.io` or vice versa.
4. **Key is for a different org** — the SDK attached the right
header but the org mapping is stale.
5. **Account was suspended** — the org is in a billing hold.

## How to fix

1. Open https://app.nullrun.io/settings/api-keys.
2. Confirm the key prefix (`nr_live_…`) matches what the SDK
sent. (If you need the prefix from the running SDK, log
`str(api_key)[:10]`.)
3. If the key is gone, create a new one and update the
`NULLRUN_API_KEY` env var (or the explicit `api_key=`
argument to `nullrun.init`).
4. Restart the application so the SDK picks up the new key.

## Catch pattern

```python
from nullrun.breaker.exceptions import NullRunAuthError

try:
nullrun.init(api_key="nr_live_…")
except NullRunAuthError as exc:
if exc.error_code == "NR-A003":
log.error("API key rejected: %s", exc.user_action)
# Show the user a friendly "your key was revoked" UI
# instead of the raw exception.
return render_key_revoked_page()
raise
```

## Related codes

- `NR-A001` — `/auth/verify` returned non-200 (other than 401).
- `NR-A002` — `/auth/verify` response missing `organization_id`.
- `NR-C001` — `init()` called with no api_key at all.
13 changes: 13 additions & 0 deletions docs/errors/NR-B001.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# NR-B001 — Network error

| Field | Value |
|---|---|
| **Code** | `NR-B001` |
| **Category** | Backend / network |
| **Exception class** | `NullRunTransportError` (source=`NETWORK_ERROR`) |
| **Retryable** | Yes |

`httpx.ConnectError`, timeout, DNS failure. The backend may be up
but the SDK cannot reach it. Retry after a backoff; if persistent,
check firewall / proxy / DNS config. The @protect body did NOT
run when this is raised from a sensitive-tool pre-check (fail-CLOSED).
12 changes: 12 additions & 0 deletions docs/errors/NR-B002.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# NR-B002 — Backend 5xx

| Field | Value |
|---|---|
| **Code** | `NR-B002` |
| **Category** | Backend / network |
| **Exception class** | `NullRunBackendError` (subclass of `NullRunTransportError`) |
| **Retryable** | Yes |

The NullRun backend returned a server error. Usually transient —
retry after a few seconds. If it persists for more than a minute,
check https://status.nullrun.io or contact support.
13 changes: 13 additions & 0 deletions docs/errors/NR-B004.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# NR-B004 — Budget exhausted

| Field | Value |
|---|---|
| **Code** | `NR-B004` |
| **Category** | Backend |
| **Exception class** | `NullRunBudgetError` (subclass of `NullRunBlockedException`) |
| **Retryable** | No |

Workflow budget is exhausted. Every @protect call will be rejected
until the budget is raised or the next billing cycle. Increase the
budget at https://app.nullrun.io/billing or wait. The `except
NullRunBlockedException` clause still catches this — back-compat.
13 changes: 13 additions & 0 deletions docs/errors/NR-B005.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# NR-B005 — Local circuit breaker tripped

| Field | Value |
|---|---|
| **Code** | `NR-B005` |
| **Category** | Backend |
| **Exception class** | `NullRunTransportError` (source=`BREAKER_OPEN`) |
| **Retryable** | Yes |

The SDK's local circuit breaker tripped after consecutive transport
failures. The SDK is refusing outbound calls for a cooldown
window to avoid amplifying a backend outage. Retries are
automatically scheduled — manual retry is unnecessary.
52 changes: 52 additions & 0 deletions docs/errors/NR-C001.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# NR-C001 — No API key provided to `init()`

| Field | Value |
|---|---|
| **Code** | `NR-C001` |
| **Category** | Configuration |
| **Exception class** | `NullRunAuthenticationError` (kept for back-compat; would be `NullRunConfigError` in a clean-slate design) |
| **Retryable** | No |
| **Default `user_action`** | "Get an API key at https://app.nullrun.io/settings/api-keys, then either pass api_key='nr_live_...' to nullrun.init() or set the NULLRUN_API_KEY environment variable. The SDK cannot operate without credentials — the silent no-op fallback was removed in 0.3.0 because it bypassed every backend gate." |

## When

`nullrun.init()` was called without an `api_key` argument AND the
`NULLRUN_API_KEY` environment variable is unset or empty.

## Why this raises (instead of falling back)

Prior to 0.3.0 the SDK silently fell back to "local mode" (a
`NullRunNoop` stub) when no key was provided. That stub bypassed
every backend gate (budget, policy, control plane) — production
callers were unaware their policies were not being enforced. See
[cloud-only-invariant](../../nullrun-docs/memory/cloud-only-invariant.md)
in the docs memory for the full rationale.

## How to fix

1. Create an API key at https://app.nullrun.io/settings/api-keys.
2. Either:
- pass it explicitly: `nullrun.init(api_key="nr_live_...")`, or
- set the env var: `export NULLRUN_API_KEY=nr_live_...` (or
equivalent for your shell / process manager).
3. Re-run the application.

## Catch pattern

```python
import nullrun
from nullrun.breaker.exceptions import NullRunAuthenticationError

try:
nullrun.init()
except NullRunAuthenticationError as exc:
if exc.error_code == "NR-C001":
# Show the user the dashboard link inline.
return render_onboarding(api_key_help_url=exc.user_action)
raise
```

## Related codes

- `NR-A001` / `NR-A002` / `NR-A003` — key provided but rejected.
- `NR-C003` — runtime bound, but no `org_id` available for `get_org_status()`.
12 changes: 12 additions & 0 deletions docs/errors/NR-C003.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# NR-C003 — `get_org_status()` called before runtime bound to an org

| Field | Value |
|---|---|
| **Code** | `NR-C003` |
| **Category** | Configuration |
| **Exception class** | `NullRunAuthenticationError` |
| **Retryable** | No |

`get_org_status()` requires the runtime to know which org to query.
Called before `nullrun.init()` completed (or after `shutdown()`).
Pass `org_id=<uuid>` explicitly, or ensure `init()` finished.
15 changes: 15 additions & 0 deletions docs/errors/NR-L001.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# NR-L001 — Loop detector tripped

| Field | Value |
|---|---|
| **Code** | `NR-L001` |
| **Category** | Loop |
| **Exception class** | `NullRunBlockedException` |
| **Retryable** | No |

The backend's loop detector tripped (>6 same-tool calls in 60s
window by default). The body did not run. Wait 60s for the
counter to clear, or change the agent's behaviour. Local loop
detection (in `_local_check`) does NOT raise — it returns
`allowed=False` in the `track_event` dict. This code is set
only on backend-detected loop blocks.
13 changes: 13 additions & 0 deletions docs/errors/NR-R001.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# NR-R001 — 429 rate limit from gateway

| Field | Value |
|---|---|
| **Code** | `NR-R001` |
| **Category** | Rate limit |
| **Exception class** | `RateLimitError` (subclass of `NullRunTransportError`) |
| **Retryable** | Yes |

The NullRun backend rate-limited this API key. Wait
`exc.retry_after` seconds (or upgrade the plan) before retrying.
`exc.upgrade_url` is the plan-upgrade link when the gateway
included it in the 429 body.
14 changes: 14 additions & 0 deletions docs/errors/NR-T001.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# NR-T001 — Tool in block list

| Field | Value |
|---|---|
| **Code** | `NR-T001` |
| **Category** | Tool |
| **Exception class** | `NullRunToolBlockedError` (subclass of `NullRunBlockedException`) |
| **Retryable** | No |

The tool is in the workflow's block list. The body did not run.
`exc.tool_name` is set to the blocked tool. Remove the tool from
the block list at https://app.nullrun.io/policies/<workflow> or
use a different tool. `except NullRunBlockedException` still
catches this — back-compat.
15 changes: 15 additions & 0 deletions docs/errors/NR-W002.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# NR-W002 — Workflow killed

| Field | Value |
|---|---|
| **Code** | `NR-W002` |
| **Category** | Workflow state |
| **Exception class** | `WorkflowKilledInterrupt` (subclass of `BaseException`, NOT `Exception`) |
| **Retryable** | No |

The workflow was killed by the NullRun control plane (via API or
auto-kill on budget exhaustion). The body did not run. The kill
is non-recoverable from inside the agent loop — let the signal
propagate to the top. `except Exception` will NOT catch this
signal by design; use `except WorkflowKilledInterrupt` or
`except BaseException`. See `docs/kill-contract.md` §6.
14 changes: 14 additions & 0 deletions docs/errors/NR-W003.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# NR-W003 — Workflow paused

| Field | Value |
|---|---|
| **Code** | `NR-W003` |
| **Category** | Workflow state |
| **Exception class** | `WorkflowPausedException` |
| **Retryable** | No |

The workflow is paused (cooldown or human approval). The body
did not run. Resume the workflow at
https://app.nullrun.io/workflows/<workflow_id> or wait for the
cooldown to expire (if `resume_after` is set, see
`exc.resume_after`).
110 changes: 110 additions & 0 deletions docs/errors/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
# NullRun SDK error codes

Every user-facing SDK exception carries a stable `error_code` so you
can branch on the failure mode without parsing the message string.
The codes follow a `NR-<CATEGORY><NNN>` pattern:

| Prefix | Category | When |
|---|---|---|
| `NR-C` | **C**onfiguration | Missing or invalid SDK config (no api_key, no workflow, etc.) |
| `NR-A` | **A**uthentication | API key rejected, auth response malformed |
| `NR-B` | **B**ackend | 5xx, network error, budget exhausted |
| `NR-W` | **W**orkflow state | Workflow killed, paused |
| `NR-T` | **T**ool | Tool in block list |
| `NR-L` | **L**oop | Loop detector tripped |
| `NR-R` | **R**ate limit | 429 from gateway |
| `NR-X` | Mis**x** | Generic block (fallback when code is unknown) |

## Catalogue

### Configuration (NR-C)

| Code | When | See |
|---|---|---|
| `NR-C001` | `nullrun.init()` called with no api_key (no param, no env) | [NR-C001](NR-C001.md) |
| `NR-C003` | `get_org_status()` called before the runtime is bound to an org | [NR-C003](NR-C003.md) |

### Authentication (NR-A)

| Code | When | See |
|---|---|---|
| `NR-A001` | `/auth/verify` returned non-200 (other than 401) | [NR-A001](NR-A001.md) |
| `NR-A002` | `/auth/verify` response missing `organization_id` | [NR-A002](NR-A002.md) |
| `NR-A003` | Any endpoint returned 401 — key was rejected | [NR-A003](NR-A003.md) |

### Backend / network (NR-B)

| Code | When | See |
|---|---|---|
| `NR-B001` | Network error: timeout, ConnectError, DNS failure | [NR-B001](NR-B001.md) |
| `NR-B002` | 5xx from the NullRun backend | [NR-B002](NR-B002.md) |
| `NR-B004` | Budget exhausted | [NR-B004](NR-B004.md) |
| `NR-B005` | Local circuit breaker tripped | [NR-B005](NR-B005.md) |

### Workflow state (NR-W)

| Code | When | See |
|---|---|---|
| `NR-W002` | Workflow killed by control plane | [NR-W002](NR-W002.md) |
| `NR-W003` | Workflow paused (cooldown or human approval) | [NR-W003](NR-W003.md) |

### Tool / loop / rate (NR-T, NR-L, NR-R)

| Code | When | See |
|---|---|---|
| `NR-T001` | Tool in the workflow's block list | [NR-T001](NR-T001.md) |
| `NR-L001` | Loop detector tripped (>6 same tool calls in 60s) | [NR-L001](NR-L001.md) |
| `NR-R001` | 429 from the gateway (per-key rate limit) | [NR-R001](NR-R001.md) |

## Generic fallbacks

| Code | When |
|---|---|
| `NR-X001` | Generic block — the SDK raised `NullRunBlockedException` but could not classify it. Usually means the backend stamped a non-standard explanation. |
| `NR-0000` | Default on the base `NullRunError` class. A subclass forgot to override. Please open an issue. |

## How to use the catalogue

Every public exception exposes `error_code`, `user_action`, `retryable`,
`docs_url` directly. Cookbook pattern:

```python
import nullrun
from nullrun.breaker.exceptions import NullRunError, NullRunBudgetError

@nullrun.protect
def my_agent():
try:
...
except NullRunBudgetError as exc:
# specific handler for budget exhaustion
return f"Out of budget: {exc.user_action}"
except NullRunError as exc:
# catch-all for any structured SDK failure
log.error(
"NullRun error",
extra={
"error_code": exc.error_code,
"user_action": exc.user_action,
"retryable": exc.retryable,
"docs_url": exc.docs_url,
},
)
if exc.retryable:
return retry_with_backoff()
raise
```

## Adding a new code

1. Pick the right category prefix (`NR-C` / `NR-A` / ...).
2. Pick the next free number in that category.
3. Add a class attribute to the exception class
(`error_code = "NR-XNNN"`).
4. Override `user_action` with a short imperative sentence.
5. Set `retryable` to `True` only for transient failures.
6. Add a new page under this directory following the existing
template (see [NR-A003](NR-A003.md) for a worked example).
7. Update the catalogue table above.
8. Add a unit test in `tests/test_exception_hierarchy.py`
(`TestErrorCodeCatalog::test_<code>`).
Loading
Loading