Skip to content

Promote to production: atomic state writes + async test repair#217

Merged
crtahlin merged 5 commits into
mainfrom
dev
Jun 4, 2026
Merged

Promote to production: atomic state writes + async test repair#217
crtahlin merged 5 commits into
mainfrom
dev

Conversation

@crtahlin
Copy link
Copy Markdown

@crtahlin crtahlin commented Jun 4, 2026

Promotes devmain. Validated on staging (provenance-gateway.dev.datafund.io) and full local suite green (844 passed, 7 skipped).

Contains exactly today's two merged PRs:

Staging smoke tests after these merges: /health, /api/v1/pool/status, /api/v1/stamps/, /api/v1/notary/status, /metrics all 200 and healthy.

crtahlin added 5 commits June 4, 2026 13:50
The stamp ownership registry and pool state were written with a plain
open(path, 'w'), which truncates the target immediately. A crash mid-write
(OOM, deploy restart, SIGKILL) left a truncated/corrupt file; on load both
managers discard corrupt state and start fresh, silently losing the wallet
ownership registry.

Add app/core/atomic_io.atomic_write_json() — writes to a temp file in the
same directory, fsyncs, then os.replace() to atomically swap it in. Readers
always see a complete file. Wire it into both _save_state() methods.

Closes #212
There is no CI test gate (deploy.yml only deploys), so passing the full
suite locally is a manual development-process requirement before opening
or merging any PR into dev or main. Documents the gap that let the x402
async test breakage go unnoticed (see #215).
Write JSON state files atomically to prevent corruption on crash
51 tests across 5 files had been failing since the x402 async migration
(commit 4c22909) converted the pricing, balance, and preflight functions
to async without updating the tests, which still called them synchronously
and got coroutine objects back ('coroutine' object is not subscriptable).

For each affected test:
- mark @pytest.mark.asyncio, make it async, and await the call
- upgrade the relevant @patch targets to AsyncMock where the patched
  function is itself awaited (get_chainstate, get_wallet_info,
  check_* helpers, _get_eth_balance_from_rpc, read_root deps)

test_poll_updates_gauges was failing for a different reason: the #208
'keep last known value' refactor moved _poll_balances to call preflight's
check_* helpers, which bind get_wallet_info/get_chequebook_info at their
own module level. Patch those on app.x402.preflight (not swarm_api) so the
test is deterministic regardless of run order.

Full suite now green: 844 passed, 7 skipped. Addresses #215.
Repair 51 stale tests broken by x402 async migration
@crtahlin crtahlin merged commit d366772 into main Jun 4, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

1 participant