wip: post-merge sync of 0.3.0->0.4.0 migration working tree#5
Closed
maltsev-dev wants to merge 1 commit into
Closed
wip: post-merge sync of 0.3.0->0.4.0 migration working tree#5maltsev-dev wants to merge 1 commit into
maltsev-dev wants to merge 1 commit into
Conversation
PR #3 (Wip/working tree 2026 06 18) merged 1e9e9c0 with the CI-fix commit and the byte-mismatch fix on top, but the actual 0.3.0->0.4.0 migration diff (the 45-file cleanup that this PR was named after) appears to have been stripped during the merge - the working tree at HEAD still shows the full diff vs the initial import. Re-stage the same content here so the audit-trail, examples, decorators cleanup, and 14 new test files are actually on master. The 0.3.0->0.4.0 migration is the same one captured in wip/working-tree-2026-06-18 branch (commit 1244901). Per the CHANGELOG the migration removes PoolConfig/AdaptivePool, the gRPC transport, the signal.signal hijack, the protos/nullrun/v1/track.proto file, six zombie exception types, the decision_history/flow/gate/common placeholder modules, patch_openai/unpatch_openai, and _decision_history._organization_id_var / _api_key_id_var. The other 14 added test files and the analyze.md plan live in the same diff and are part of the 0.4.0 production-readiness release. If this lands in the same place as PR #3 already did, it is a no-op for master. If something went wrong in the merge and the diff was never persisted, this commit restores it. Validation: local 'pytest tests/' on this commit will rerun the test collection chain (decorators -> instrumentation.langgraph -> langchain_core) which is the reason the CI fix is also in this branch; without langchain-core in [dev] the collection errors out across all 20+ test files.
maltsev-dev
added a commit
that referenced
this pull request
Jun 20, 2026
…erage reporter (#26) * fix: P0 security/stability hardening bundle Closes the P0/P1/P2/P3 issues from the security review (plan §10/§11.4). Security / PCI-DSS / GDPR - P0-1: Mask positional PII in `_enforce_sensitive_tool` by introspecting the wrapped function's signature and applying `SENSITIVE_ARG_KEYS` to positional params. Pre-fix, `charge("4111-…-1111", 50)` forwarded the PAN into `/execute` and the audit log. - P0-6 / P3-3: `_safe_repr` now redacts BEFORE truncating. The pre-fix order truncated first, so `details={…}` past position 50 leaked verbatim. `_safe_repr` is now the single source of truth for the redact-then-truncate flow. Cost-audit / reliability - P0-3: Bounded chunked reads on the sync + async httpx transports (`MAX_RESPONSE_BYTES`, default 16 MiB, `NULLRUN_MAX_RESPONSE_BYTES` env override). Above the cap, tracking is skipped and `_coverage_streaming_skipped` is incremented. Replaces the `response.read()` / `await response.aread()` unbounded buffer that held entire LLM streaming bodies in memory. - P0-4: `_do_flush_locked` re-queue on CB OPEN now drops the NEWEST non-critical events instead of the oldest. The oldest events (incident start, billing-period start) are exactly what a billing investigator needs; losing them silently broke monthly rollups. Control-plane events (`state_change`, `kill_received`, `policy_invalidated`, `key_rotated`) are preserved unconditionally so the dashboard KILL switch lands even under sustained backend outage. Identity - S-8 / P2-4: `agent()` now emits `str(uuid.uuid4())` (with dashes). Pre-fix the format was `f"agent-{uuid.uuid4().hex}"` — 32 hex chars, no dashes — and backend UUID-typed columns dropped these to NULL on insert. User-supplied names are still preserved verbatim. - §7.2 #16: `workflow()` context manager now resets `span_id` (not only `workflow_id` / `trace_id`) so nested `with span()` blocks don't leave the inner span_id visible inside the workflow scope. Resource leaks - S-9: `_active_runs` on `NullRunCallback` is now an `OrderedDict` capped at 4096 with FIFO eviction. Pre-fix the dict grew unbounded when `on_chain_end` did not fire (some LangChain versions short-circuit the end hook on chain-body errors). - S-10: WebSocket reconnect loop is now capped at 10 consecutive failures, then falls back to HTTP-poll. Pre-fix the loop ran forever when the backend was permanently down, leaking the WS thread. Transport - §7.2 #6: Separate `hmac_verify_expired_total` counter so SRE can distinguish clock-skew (NTP drift) from forged packets. Mirrored in both the HTTP and WebSocket verify paths. - §7.2 #35: `CircuitBreaker.call` now dispatches the OPEN→HALF_OPEN jitter through `_maybe_apply_open_jitter_sync` / `_maybe_apply_open_jitter_async`. Pre-fix the jitter used `time.sleep` before dispatching to async, which blocked the caller's event loop on every transition. - P2-1: `_coverage_seen` now bumps in the httpx path (sync + async). Pre-fix the counter was only bumped by the `requests` transport, so the dashboard's coverage view was empty for the dominant OpenAI / Anthropic / Gemini / Mistral / Cohere traffic. - P2-3: `is_sensitive_tool` match is case-insensitive. Pre-fix `"stripe.charge"` did not match `"Stripe.Charge"`, bypassing the sensitive gate. Concurrency - §7.2 #39: New `_tools_lock` guards every mutation of `_strict_mode_tools` / `_sensitive_tools`. Same lock guards the coverage-counter bump+prune sequence (§7.2 #33) so two threads can't both observe the dict at length 4095 and both grow it to 4097 before either prune lands. - §7.2 #47: New `_langchain_lock` / `_langgraph_lock` guard the patch sequences end-to-end. Pre-fix two threads racing through `auto_instrument` could both pass the early `_x_patched` check and double-wrap `BaseCallbackManager` / `Pregel`. - §7.2 #33: `_COVERAGE_CAP` (4096) bounds the per-host coverage dicts. Webhook delivery - P3-2: Exponential backoff (0.5s, 1s, 2s, 4s, 8s, 16s, 30s cap) replaces the previous linear schedule. Linear didn't back off fast enough under sustained outage — each KILL/PAUSE spawned its own delivery thread, producing 1000+ spinning threads hammering the dead endpoint. WAL crash-recovery - P1-5b: Atomic WAL writes (tmp + `fsync` + `os.replace`), 64 MiB rotation with `os.replace(wal, wal.1)`, replay drains both `wal.1` and `wal`. New `NULLRUN_WAL_PATH` / `NULLRUN_WAL_MAX_BYTES` env overrides for containers with `readOnlyRootFilesystem: true`. Tests 8 new regression test files (57 tests total): test_agent_id_uuid.py, test_args_pii_masked.py, test_streaming_oom_cap.py, test_lru_active_runs.py, test_reconnect_cap.py, test_coverage_seen_httpx.py, test_webhook_backoff.py, test_redact.py `test_buffer_invariants.py` extended with drop-newest + critical-event preservation cases. `test_release_polish.py` updated to pin the 5s cap on both the sync and async jitter helpers (post §7.2 #35 split). Full incident write-ups in CHANGELOG.md under the same P0/S/P tags. * fix: address ruff lint findings from CI Three CI lint failures on `ruff check src/` — fixes only, no behavioural changes: - **B905** (`src/nullrun/decorators.py:162`): `zip(bound_params, args)` now passes `strict=False` explicitly. Pre-fix the two iterables can be different lengths — `bound_params` is sliced to `[: len(args)]` but the function may have fewer positional parameters than args provided (e.g. *args-style callables), in which case the trailing loop below handles the excess. `strict=` was implicit and triggered B905. Now explicit so the intent is documented in code. - **I001** (`src/nullrun/instrumentation/auto.py:1146`): the late `import os as _os` was moved to the top-of-file import block as `import os` (alphabetical order: hashlib, json, logging, os, threading). The `_os` alias was only there to avoid shadowing — there is no top-level `os` in scope, so the plain name is fine. Call site updated to use `os.environ.get(...)`. - **S108** (`src/nullrun/transport.py:632`): replaced the hardcoded `/tmp/nullrun.wal` with `os.path.join(tempfile.gettempdir(), "nullrun.wal")`. The hardcoded `/tmp` flagged S108 (insecure / non-portable temp path) and would have broken the SDK on Windows out of the box. `gettempdir()` returns the OS-appropriate temp dir (`/tmp` on Linux, `/var/folders/...` on macOS, `%TEMP%` on Windows). `NULLRUN_WAL_PATH` env override still wins, so containers with `readOnlyRootFilesystem: true` are unaffected. Added `import tempfile` to the top-of-file imports. Verified: - `ruff check src/` → All checks passed! - `mypy src/` → Success: no issues found in 23 source files - `pytest` → 493 passed, 13 skipped (CI default, no `-W error`) * chore(release): bump to 0.5.2 - Promote [Unreleased] to [0.5.2] — 2026-06-19; merge the two [Unreleased] sections that had drifted during Sprint 2.5 + Phase 0 development so release tooling scanning for the [Unreleased] anchor picks up the complete change set exactly once. - Add PEP 561 marker (py.typed) — the package ships inline type annotations; the marker tells mypy / pyright / pylance to honour them. - runtime.py (S-4): case-insensitive state compare in check_control_plane. Defensive against any backend casing drift beyond the current PascalCase (handlers.rs:9258). Pinned by tests/test_state_compare_case_insensitive.py (10 cases covering PascalCase / UPPERCASE / lowercase / mixed-case). Working-notes file docs/integration-baseline-2026-06-19.md is deliberately left untracked, matching the analyze.md pattern from d74712e. * test: bump coverage 70.92% → 84.52% with branch coverage Lifts the SDK's Codecov score from 70.92 % to 84.52 % (+13.6 pp) by adding 347 new tests across 10 files that exercise previously-untested branches in the auto-instrumentation patches, runtime gates, transport fallback modes, circuit breaker Redis path, and the @Protect decorator fail-CLOSED contract. pyproject.toml - Enable branch coverage so error / fallback paths count. - Raise fail_under from 70 → 82 (enforced in CI via `coverage run -m pytest && coverage report`). - Add precision=2 and skip_empty=true to keep the report readable. New tests (all 817 pass locally, all 4 CI jobs green): tests/test_autogen_patch.py — 13 tests tests/test_crewai_patch.py — 15 tests tests/test_llama_index_patch.py — 13 tests tests/test_langgraph_callback.py — 38 tests tests/test_auto_requests.py — 24 tests tests/test_runtime_branches.py — 43 tests tests/test_transport_branches.py — 44 tests tests/test_circuit_breaker_branches.py — 31 tests tests/test_protect_branches.py — 43 tests tests/test_actions_context_init.py — 50 tests Per-file coverage deltas: instrumentation/autogen.py 21.33 → 93.41 % instrumentation/crewai.py 22.97 → 90.82 % instrumentation/llama_index.py 28.30 → 100.00 % instrumentation/langgraph.py 23.75 → 93.69 % instrumentation/auto_requests.py 33.72 → 99.09 % breaker/circuit_breaker.py 59.76 → 90.21 % transport.py 82.57 → 84.79 % transport_websocket.py 68.70 → 64.10 % (msg-type branches still need live ws round-trip tests) decorators.py 83.33 → 95.49 % runtime.py 80.14 → 83.24 % context.py 82.76 → 100.00 % actions.py 92.12 → 96.89 % breaker/exceptions.py 98.51 → 97.26 % All 4 CI jobs pass locally (pytest, ruff check, mypy, coverage). Working-notes file docs/integration-baseline-2026-06-19.md is deliberately left untracked, matching the analyze.md pattern from d74712e. * feat(security): make @sensitive registration fail-CLOSED (ADR-008) Sensitive-tool registration is part of the security boundary. The old behaviour caught any exception from _get_or_create_runtime(), logged it at DEBUG, and returned the original function unchanged — which meant the wrapped body would later execute without ever being added to the runtime's sensitive-tool set, completely bypassing the pre-execution gate under partial initialization (e.g. transient NullRunAuthenticationError on import). Replace the silent logger.debug(...) with raise RuntimeError(..., chained from the original exception. The decorator is the registration point, not the call site, so raising at decoration time is the correct signal: the import / module-load fails loudly, the body never gets a chance to run untracked, and the caller can still inspect the root cause via __cause__. The two pre-existing tests pinned the old (silent / wrong-type) contract; update them to assert the new RuntimeError wrapping: - test_sensitive_raises_on_missing_api_key now expects RuntimeError whose __cause__ is the original NullRunAuthenticationError. - test_sensitive_runtime_init_failure_is_silent is renamed to ..._raises and asserts the same __cause__ chaining when a _get_or_create_runtime mock raises. * fix(transport): retry /track/batch on 5xx and align auth-verify path (P0 #2, P0 #5) P0 #2 — _send_batch_with_retry_info used to do a single self._client.post(...) + raise_for_status(). A transient backend 5xx raised out of the flush path; the in-memory buffer was cleared at the call site and every event in the batch was permanently lost. Wrap the post() in _retry_with_backoff (max 3 attempts, exponential backoff + jitter, capped at 10s) so a single 500 no longer drops the whole batch. 429 is retried (helper honors Retry-After when present); other 4xx errors are returned as-is — those are real client bugs and must not be retried (e.g. a 401 just wastes the user's budget). P0 #5 — contract drift: this file's auth-verify call site used /auth/verify, while the corresponding call in runtime.py:599 already used /api/v1/auth/verify. Align the rotation call site to /api/v1/auth/verify so the contract-drift-guard CI catches any future divergence. Update tests/test_transport.py::test_retry_on_500 to assert the new contract (third attempt succeeds → call_count == 3, event id in accepted_event_ids) instead of expecting an immediate exception. Add tests/test_track_batch_retry.py with full regression coverage: single 5xx → success, three consecutive 5xx → BreakerTransportError, 429 with Retry-After → honored before next attempt. * feat(runtime): emit background coverage_report every 60s The SDK has tracked per-host seen / tracked / streaming_skipped counters since 0.4.x (bump_coverage_counter, get_coverage_stats), but there was no path to ship them to the backend — the counters only ever existed in process memory. This commit adds a daemon thread that emits a coverage_report track event every 60 seconds so the backend can build the per-host coverage dashboard. * NullRunRuntime.track_coverage() — returns a track-result dict when there is something to report, or None on cold start (no counters bumped yet) so the backend doesn't get an empty row per minute. * start_coverage_reporter() / stop_coverage_reporter() — idempotent lifecycle, daemon thread, sleeps in 0.5s slices for responsive shutdown, emits once on entry so short-lived processes (CI, batch jobs) still leave a row. * nullrun.init() wires start_coverage_reporter() in; the reporter is a no-op while the process is still cold, so re-init is safe. New tests/test_coverage_report.py pins the contract: cold start → None, post-traffic → track-result dict with type=coverage_report and the three counter dicts, start is idempotent, stop joins cleanly. * chore(breaker): add __main__ shim so 'python -m nullrun.breaker' exits cleanly Historically the SDK shipped a 'python -m nullrun.breaker' entry point for in-container health probes and ad-hoc debugging. The nullrun.breaker subpackage is the circuit-breaker + policy-exceptions surface — it is not a runnable command. Without this shim, containerized deployments that scripted 'python -m nullrun.breaker' as a no-op smoke check would fail with 'No module named nullrun.breaker.__main__'. This module makes that invocation exit cleanly (return 0) and print a short pointer to nullrun-doctor (nullrun.toolbox.diagnostics) for real runtime checks. * chore: gitignore audit.md (project-local working notes, sibling of analyze.md) * test: re-align @sensitive test with fail-CLOSED contract after master merge The auto-merge of master into this branch (commit 7875210) resolved tests/test_protect_branches.py by taking master's side of the conflict, leaving the old test_sensitive_runtime_init_failure_is_silent in place. That test asserts @sensitive does NOT raise — but the production change in commit 58263a1 (this branch) makes @sensitive raise RuntimeError (fail-CLOSED, ADR-008). Result: CI ran the old assertion against the new production code and failed. Restore the renamed and re-asserted version of the test from commit 58263a1 — test_sensitive_runtime_init_failure_raises — so the test asserts the new contract: RuntimeError is raised and __cause__ chains the original exception. runtime.py was resolved correctly by the auto-merge (both sides kept: the new track_coverage / start_coverage_reporter / stop_coverage_reporter / _coverage_reporter_loop methods AND the existing bump_coverage_counter are all present), so no changes there.
maltsev-dev
added a commit
that referenced
this pull request
Jun 21, 2026
* fix: P0 security/stability hardening bundle
Closes the P0/P1/P2/P3 issues from the security review (plan §10/§11.4).
Security / PCI-DSS / GDPR
- P0-1: Mask positional PII in `_enforce_sensitive_tool` by introspecting
the wrapped function's signature and applying `SENSITIVE_ARG_KEYS` to
positional params. Pre-fix, `charge("4111-…-1111", 50)` forwarded the
PAN into `/execute` and the audit log.
- P0-6 / P3-3: `_safe_repr` now redacts BEFORE truncating. The pre-fix
order truncated first, so `details={…}` past position 50 leaked
verbatim. `_safe_repr` is now the single source of truth for the
redact-then-truncate flow.
Cost-audit / reliability
- P0-3: Bounded chunked reads on the sync + async httpx transports
(`MAX_RESPONSE_BYTES`, default 16 MiB, `NULLRUN_MAX_RESPONSE_BYTES`
env override). Above the cap, tracking is skipped and
`_coverage_streaming_skipped` is incremented. Replaces the
`response.read()` / `await response.aread()` unbounded buffer that
held entire LLM streaming bodies in memory.
- P0-4: `_do_flush_locked` re-queue on CB OPEN now drops the NEWEST
non-critical events instead of the oldest. The oldest events
(incident start, billing-period start) are exactly what a billing
investigator needs; losing them silently broke monthly rollups.
Control-plane events (`state_change`, `kill_received`,
`policy_invalidated`, `key_rotated`) are preserved unconditionally
so the dashboard KILL switch lands even under sustained backend
outage.
Identity
- S-8 / P2-4: `agent()` now emits `str(uuid.uuid4())` (with dashes).
Pre-fix the format was `f"agent-{uuid.uuid4().hex}"` — 32 hex chars,
no dashes — and backend UUID-typed columns dropped these to NULL
on insert. User-supplied names are still preserved verbatim.
- §7.2 #16: `workflow()` context manager now resets `span_id` (not
only `workflow_id` / `trace_id`) so nested `with span()` blocks
don't leave the inner span_id visible inside the workflow scope.
Resource leaks
- S-9: `_active_runs` on `NullRunCallback` is now an `OrderedDict`
capped at 4096 with FIFO eviction. Pre-fix the dict grew
unbounded when `on_chain_end` did not fire (some LangChain
versions short-circuit the end hook on chain-body errors).
- S-10: WebSocket reconnect loop is now capped at 10 consecutive
failures, then falls back to HTTP-poll. Pre-fix the loop ran
forever when the backend was permanently down, leaking the
WS thread.
Transport
- §7.2 #6: Separate `hmac_verify_expired_total` counter so SRE can
distinguish clock-skew (NTP drift) from forged packets. Mirrored
in both the HTTP and WebSocket verify paths.
- §7.2 #35: `CircuitBreaker.call` now dispatches the OPEN→HALF_OPEN
jitter through `_maybe_apply_open_jitter_sync` /
`_maybe_apply_open_jitter_async`. Pre-fix the jitter used
`time.sleep` before dispatching to async, which blocked the
caller's event loop on every transition.
- P2-1: `_coverage_seen` now bumps in the httpx path (sync + async).
Pre-fix the counter was only bumped by the `requests` transport,
so the dashboard's coverage view was empty for the dominant
OpenAI / Anthropic / Gemini / Mistral / Cohere traffic.
- P2-3: `is_sensitive_tool` match is case-insensitive. Pre-fix
`"stripe.charge"` did not match `"Stripe.Charge"`, bypassing the
sensitive gate.
Concurrency
- §7.2 #39: New `_tools_lock` guards every mutation of
`_strict_mode_tools` / `_sensitive_tools`. Same lock guards the
coverage-counter bump+prune sequence (§7.2 #33) so two threads
can't both observe the dict at length 4095 and both grow it to
4097 before either prune lands.
- §7.2 #47: New `_langchain_lock` / `_langgraph_lock` guard the
patch sequences end-to-end. Pre-fix two threads racing through
`auto_instrument` could both pass the early `_x_patched` check
and double-wrap `BaseCallbackManager` / `Pregel`.
- §7.2 #33: `_COVERAGE_CAP` (4096) bounds the per-host coverage
dicts.
Webhook delivery
- P3-2: Exponential backoff (0.5s, 1s, 2s, 4s, 8s, 16s, 30s cap)
replaces the previous linear schedule. Linear didn't back off
fast enough under sustained outage — each KILL/PAUSE spawned
its own delivery thread, producing 1000+ spinning threads
hammering the dead endpoint.
WAL crash-recovery
- P1-5b: Atomic WAL writes (tmp + `fsync` + `os.replace`), 64 MiB
rotation with `os.replace(wal, wal.1)`, replay drains both
`wal.1` and `wal`. New `NULLRUN_WAL_PATH` / `NULLRUN_WAL_MAX_BYTES`
env overrides for containers with `readOnlyRootFilesystem: true`.
Tests
8 new regression test files (57 tests total):
test_agent_id_uuid.py, test_args_pii_masked.py,
test_streaming_oom_cap.py, test_lru_active_runs.py,
test_reconnect_cap.py, test_coverage_seen_httpx.py,
test_webhook_backoff.py, test_redact.py
`test_buffer_invariants.py` extended with drop-newest +
critical-event preservation cases. `test_release_polish.py`
updated to pin the 5s cap on both the sync and async jitter
helpers (post §7.2 #35 split).
Full incident write-ups in CHANGELOG.md under the same P0/S/P tags.
* fix: address ruff lint findings from CI
Three CI lint failures on `ruff check src/` — fixes only, no
behavioural changes:
- **B905** (`src/nullrun/decorators.py:162`): `zip(bound_params,
args)` now passes `strict=False` explicitly. Pre-fix the two
iterables can be different lengths — `bound_params` is sliced to
`[: len(args)]` but the function may have fewer positional
parameters than args provided (e.g. *args-style callables), in
which case the trailing loop below handles the excess. `strict=`
was implicit and triggered B905. Now explicit so the intent is
documented in code.
- **I001** (`src/nullrun/instrumentation/auto.py:1146`): the late
`import os as _os` was moved to the top-of-file import block as
`import os` (alphabetical order: hashlib, json, logging, os,
threading). The `_os` alias was only there to avoid shadowing —
there is no top-level `os` in scope, so the plain name is fine.
Call site updated to use `os.environ.get(...)`.
- **S108** (`src/nullrun/transport.py:632`): replaced the
hardcoded `/tmp/nullrun.wal` with
`os.path.join(tempfile.gettempdir(), "nullrun.wal")`. The
hardcoded `/tmp` flagged S108 (insecure / non-portable temp
path) and would have broken the SDK on Windows out of the box.
`gettempdir()` returns the OS-appropriate temp dir
(`/tmp` on Linux, `/var/folders/...` on macOS, `%TEMP%` on
Windows). `NULLRUN_WAL_PATH` env override still wins, so
containers with `readOnlyRootFilesystem: true` are unaffected.
Added `import tempfile` to the top-of-file imports.
Verified:
- `ruff check src/` → All checks passed!
- `mypy src/` → Success: no issues found in 23 source files
- `pytest` → 493 passed, 13 skipped (CI default, no `-W error`)
* chore(release): bump to 0.5.2
- Promote [Unreleased] to [0.5.2] — 2026-06-19; merge the two
[Unreleased] sections that had drifted during Sprint 2.5 +
Phase 0 development so release tooling scanning for the
[Unreleased] anchor picks up the complete change set exactly
once.
- Add PEP 561 marker (py.typed) — the package ships inline type
annotations; the marker tells mypy / pyright / pylance to honour
them.
- runtime.py (S-4): case-insensitive state compare in
check_control_plane. Defensive against any backend casing drift
beyond the current PascalCase (handlers.rs:9258). Pinned by
tests/test_state_compare_case_insensitive.py (10 cases covering
PascalCase / UPPERCASE / lowercase / mixed-case).
Working-notes file docs/integration-baseline-2026-06-19.md is
deliberately left untracked, matching the analyze.md pattern from
d74712e.
* test: bump coverage 70.92% → 84.52% with branch coverage
Lifts the SDK's Codecov score from 70.92 % to 84.52 % (+13.6 pp) by
adding 347 new tests across 10 files that exercise previously-untested
branches in the auto-instrumentation patches, runtime gates, transport
fallback modes, circuit breaker Redis path, and the @Protect decorator
fail-CLOSED contract.
pyproject.toml
- Enable branch coverage so error / fallback paths count.
- Raise fail_under from 70 → 82 (enforced in CI via `coverage run -m
pytest && coverage report`).
- Add precision=2 and skip_empty=true to keep the report readable.
New tests (all 817 pass locally, all 4 CI jobs green):
tests/test_autogen_patch.py — 13 tests
tests/test_crewai_patch.py — 15 tests
tests/test_llama_index_patch.py — 13 tests
tests/test_langgraph_callback.py — 38 tests
tests/test_auto_requests.py — 24 tests
tests/test_runtime_branches.py — 43 tests
tests/test_transport_branches.py — 44 tests
tests/test_circuit_breaker_branches.py — 31 tests
tests/test_protect_branches.py — 43 tests
tests/test_actions_context_init.py — 50 tests
Per-file coverage deltas:
instrumentation/autogen.py 21.33 → 93.41 %
instrumentation/crewai.py 22.97 → 90.82 %
instrumentation/llama_index.py 28.30 → 100.00 %
instrumentation/langgraph.py 23.75 → 93.69 %
instrumentation/auto_requests.py 33.72 → 99.09 %
breaker/circuit_breaker.py 59.76 → 90.21 %
transport.py 82.57 → 84.79 %
transport_websocket.py 68.70 → 64.10 % (msg-type branches
still need live ws
round-trip tests)
decorators.py 83.33 → 95.49 %
runtime.py 80.14 → 83.24 %
context.py 82.76 → 100.00 %
actions.py 92.12 → 96.89 %
breaker/exceptions.py 98.51 → 97.26 %
All 4 CI jobs pass locally (pytest, ruff check, mypy, coverage).
Working-notes file docs/integration-baseline-2026-06-19.md is
deliberately left untracked, matching the analyze.md pattern from
d74712e.
* feat(security): make @sensitive registration fail-CLOSED (ADR-008)
Sensitive-tool registration is part of the security boundary. The
old behaviour caught any exception from _get_or_create_runtime(),
logged it at DEBUG, and returned the original function unchanged —
which meant the wrapped body would later execute without ever being
added to the runtime's sensitive-tool set, completely bypassing the
pre-execution gate under partial initialization (e.g. transient
NullRunAuthenticationError on import).
Replace the silent logger.debug(...) with raise RuntimeError(...,
chained from the original exception. The decorator is the registration
point, not the call site, so raising at decoration time is the correct
signal: the import / module-load fails loudly, the body never gets a
chance to run untracked, and the caller can still inspect the root
cause via __cause__.
The two pre-existing tests pinned the old (silent / wrong-type) contract;
update them to assert the new RuntimeError wrapping:
- test_sensitive_raises_on_missing_api_key now expects RuntimeError
whose __cause__ is the original NullRunAuthenticationError.
- test_sensitive_runtime_init_failure_is_silent is renamed to
..._raises and asserts the same __cause__ chaining when a
_get_or_create_runtime mock raises.
* fix(transport): retry /track/batch on 5xx and align auth-verify path (P0 #2, P0 #5)
P0 #2 — _send_batch_with_retry_info used to do a single
self._client.post(...) + raise_for_status(). A transient backend 5xx
raised out of the flush path; the in-memory buffer was cleared at the
call site and every event in the batch was permanently lost. Wrap the
post() in _retry_with_backoff (max 3 attempts, exponential backoff +
jitter, capped at 10s) so a single 500 no longer drops the whole batch.
429 is retried (helper honors Retry-After when present); other 4xx
errors are returned as-is — those are real client bugs and must not
be retried (e.g. a 401 just wastes the user's budget).
P0 #5 — contract drift: this file's auth-verify call site used
/auth/verify, while the corresponding call in runtime.py:599 already
used /api/v1/auth/verify. Align the rotation call site to /api/v1/auth/verify
so the contract-drift-guard CI catches any future divergence.
Update tests/test_transport.py::test_retry_on_500 to assert the new
contract (third attempt succeeds → call_count == 3, event id in
accepted_event_ids) instead of expecting an immediate exception.
Add tests/test_track_batch_retry.py with full regression coverage:
single 5xx → success, three consecutive 5xx → BreakerTransportError,
429 with Retry-After → honored before next attempt.
* feat(runtime): emit background coverage_report every 60s
The SDK has tracked per-host seen / tracked / streaming_skipped counters
since 0.4.x (bump_coverage_counter, get_coverage_stats), but there was
no path to ship them to the backend — the counters only ever existed
in process memory. This commit adds a daemon thread that emits a
coverage_report track event every 60 seconds so the backend can build
the per-host coverage dashboard.
* NullRunRuntime.track_coverage() — returns a track-result dict when
there is something to report, or None on cold start (no counters
bumped yet) so the backend doesn't get an empty row per minute.
* start_coverage_reporter() / stop_coverage_reporter() — idempotent
lifecycle, daemon thread, sleeps in 0.5s slices for responsive
shutdown, emits once on entry so short-lived processes (CI, batch
jobs) still leave a row.
* nullrun.init() wires start_coverage_reporter() in; the reporter is
a no-op while the process is still cold, so re-init is safe.
New tests/test_coverage_report.py pins the contract: cold start → None,
post-traffic → track-result dict with type=coverage_report and the three
counter dicts, start is idempotent, stop joins cleanly.
* chore(breaker): add __main__ shim so 'python -m nullrun.breaker' exits cleanly
Historically the SDK shipped a 'python -m nullrun.breaker' entry point
for in-container health probes and ad-hoc debugging. The nullrun.breaker
subpackage is the circuit-breaker + policy-exceptions surface — it is
not a runnable command. Without this shim, containerized deployments
that scripted 'python -m nullrun.breaker' as a no-op smoke check would
fail with 'No module named nullrun.breaker.__main__'.
This module makes that invocation exit cleanly (return 0) and print a
short pointer to nullrun-doctor (nullrun.toolbox.diagnostics) for
real runtime checks.
* chore: gitignore audit.md (project-local working notes, sibling of analyze.md)
* test: re-align @sensitive test with fail-CLOSED contract after master merge
The auto-merge of master into this branch (commit 7875210) resolved
tests/test_protect_branches.py by taking master's side of the conflict,
leaving the old test_sensitive_runtime_init_failure_is_silent in place.
That test asserts @sensitive does NOT raise — but the production
change in commit 58263a1 (this branch) makes @sensitive raise
RuntimeError (fail-CLOSED, ADR-008). Result: CI ran the old assertion
against the new production code and failed.
Restore the renamed and re-asserted version of the test from commit
58263a1 — test_sensitive_runtime_init_failure_raises — so the test
asserts the new contract: RuntimeError is raised and __cause__ chains
the original exception.
runtime.py was resolved correctly by the auto-merge (both sides kept:
the new track_coverage / start_coverage_reporter / stop_coverage_reporter
/ _coverage_reporter_loop methods AND the existing bump_coverage_counter
are all present), so no changes there.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Context
After PR #2 (byte-mismatch + S-2) and PR #3 (Wip/working tree) merged, the working tree at HEAD still shows a 45-file diff against the initial import. The 0.3.0->0.4.0 migration that PR #3 was named after appears to have been stripped during the merge - either because the merge was a fast-forward without re-applying the wip diff, or because the diff was never on the wip branch tip in a way that survived the merge.
This commit re-stages the same content so the audit-trail, examples, decorators cleanup, and 14 new test files are actually on master. See the commit message for the full breakdown.
If this is a no-op (the content was already in master) the diff against master will be empty. CI will confirm.