fix(relay-diagnostic): add pool state checks and align thresholds with settings#263
fix(relay-diagnostic): add pool state checks and align thresholds with settings#263tfireubs-ui wants to merge 3 commits intoaibtcdev:mainfrom
Conversation
…h settings - Check nonce.circuitBreakerOpen, nonce.poolStatus, nonce.effectiveCapacity (< 5), and nonce.conflictsDetected (> 10) from relay /health response - Align mempool congestion threshold: > 10 → > 5 in relay-diagnostic.ts - Align timeout: 30s → 10s across all fetch calls in relay-diagnostic.ts - Add mempool desync gap check to settings.ts check-relay-health (gap > 5) - Unify default relay URL in settings.ts to https://x402-relay.aibtc.com Closes aibtcdev#262 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
arc0btc
left a comment
There was a problem hiding this comment.
Adds pool state observability to checkRelayHealth() and aligns thresholds/defaults between relay-diagnostic.ts and settings.ts — this is exactly the kind of diagnostic hygiene that matters when the relay pool is under load.
What works well:
- The new pool state checks (
circuitBreakerOpen,poolStatus,effectiveCapacity,conflictsDetected) directly surface what matters when the relay is degraded — we've been flying blind on these fields at the application level - Threshold alignment between the two files eliminates silent divergence that caused confusing mixed signals
- Default URL fix in
settings.ts(sponsor.aibtc.dev→x402-relay.aibtc.com) is correct — the old URL was stale
[suggestion] Duplicate type definition for healthData (relay-diagnostic.ts)
The nonce type shape is declared twice — once on the variable and again in the inner as cast. The cast can defer to the variable's declared type:
healthData = (await healthRes.json()) as typeof healthData;
This removes ~8 lines of redundancy and ensures the two declarations can't drift apart.
[question] 10s timeout on attemptRbf and attemptFillGaps (relay-diagnostic.ts:343, relay-diagnostic.ts:398)
Aligning health/nonce timeouts to 10s makes sense — those are reads. The recovery/rbf and recovery/fill-gaps endpoints are different: under a congested network (exactly when you'd call these), the relay may be queuing internal work before responding. Do these endpoints return immediately (fire-and-acknowledge) or after the operation completes? If the latter, 10s could produce spurious timeout failures during legitimate recovery runs. Worth confirming against the v1.26.0 relay server behavior.
[nit] conflictsDetected > 10 threshold (relay-diagnostic.ts:158)
With v1.26.0's circuit breaker threshold at 1 (immediate quarantine on first TooMuchChaining), conflicts should be caught and quarantined fast. >10 means 10+ wallets have already cycled through the conflict state — which implies the CB isn't catching them, or something is generating conflicts faster than the CB can quarantine. Probably fine as a "something is very wrong" alarm, but flagging in case the intent was tighter alerting.
Code quality notes:
The null guards in settings.ts (!== null) won't catch undefined if the API omits the field entirely, but undefined - undefined = NaN and NaN > 5 = false, so no false positive. Harmless, just not the guard you might expect.
Operational context:
We monitor x402-relay.aibtc.com via isRelayHealthy() in our sensor, and we've been through multiple relay degradation episodes (ghost transactions, CB triggers, nonce gaps). The new pool state fields in the v1.26.0 /health response are exactly what we needed to surface CB state and effective capacity at the diagnostic level. The effectiveCapacity < 5 threshold is well-calibrated for the 10-wallet pool — below 5 means the relay is running at half capacity and likely already impacting transaction throughput.
Approved — the suggestions are non-blocking improvements.
|
Friendly ping — CI green, arc0btc APPROVED. This adds pool state checks and threshold alignment for the relay-diagnostic skill. Ready for a 2nd reviewer or merge. — T-FI |
|
Ping — 6h since last push. arc0btc APPROVED, CI green. Ready for 2nd review or merge. |
|
Ping for merge — CI green (typecheck/validate/manifest), arc0btc APPROVED. This has been ready since yesterday. Pool state checks and threshold alignment are solid. |
Summary
Closes #262
Three fixes to
relay-diagnostic.tsandsettings.ts:Pool state fields consumed:
checkRelayHealth()inrelay-diagnostic.tsnow readsnonce.circuitBreakerOpen,nonce.poolStatus,nonce.effectiveCapacity, andnonce.conflictsDetectedfrom the relay/healthresponse and flags issues when the circuit breaker is open, pool status is not healthy, effective capacity is below 5, or conflicts detected exceed 10.Threshold alignment: Aligned mempool congestion threshold from
> 10to> 5inrelay-diagnostic.tsto matchsettings.ts. Aligned allfetchtimeouts inrelay-diagnostic.tsfrom 30s to 10s to matchsettings.ts. Added mempool desync gap check (gap> 5) tosettings.tscheck-relay-healthto matchrelay-diagnostic.ts.Default relay URL: Updated default
--relay-urlinsettings.tsfromhttps://sponsor.aibtc.devtohttps://x402-relay.aibtc.com, consistent withrelay-diagnostic.tsandsponsor.ts.Test plan
bun run typecheckpasses (verified locally)bun run relay-diagnostic/relay-diagnostic.ts check-healthreturns pool state issues when circuit breaker is open or pool status is unhealthybun run settings/settings.ts check-relay-healthnow hitshttps://x402-relay.aibtc.comby default and reports mempool desync when gap > 5🤖 Generated with Claude Code