Backend reliability hardening: resumable Teleporter, metric-service dedup, health/shutdown, dead-code cleanup#97
Merged
Conversation
Five reliability/maintainability fixes:
1. Remove dead code & config drift
- Delete unused fetchGlacierData.js and chainDataService.js
- Scrub phantom TVL/DefiLlama references from README
2. Graceful shutdown + dependency health checks
- /health (liveness), /health/ready (readiness; 503 when Mongo down),
/health/dependencies (active Glacier/Metrics probe, off the hot path)
- SIGTERM/SIGINT drain the HTTP server and close Mongo, with a 10s
force-exit backstop
3. Deduplicate metric services
- Collapse 6 near-identical services (activeAddresses, txCount, gasUsed,
avgGasPrice, maxTps, feesPaid) into a createMetricService factory + thin
wrappers (~1,370 fewer lines). Public method names preserved (no caller
churn); network sum-vs-avg aggregation parameterized.
4. Resumable Teleporter weekly update
- Split the ~10h 7-day fetch into per-day windows checkpointed to
partialResults, anchored to a stored referenceEndTime. A restart/crash
now resumes from the next unfetched day instead of starting over. Final
merge is identical to the old single-pass aggregation.
5. Resolve Vercel/cron deployment fit
- Production is a long-lived DigitalOcean process (node-cron reliable).
Remove stale vercel.json, vercel-build script, and isVercel branches;
document the real deployment in README.
Tests: +11 (health endpoints, metric factory aggregation, Teleporter resume
proving days 1-3 are not refetched after a day-4 crash). Full suite: 206 pass.
Also includes a pre-existing working-tree change: nodemonConfig ignore rules
in package.json.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Teleporter weekly: use a half-open window [startTime, endTime) when filtering messages so adjacent day-windows (which share a boundary timestamp) no longer both claim a message landing exactly on the boundary, which double-counted it after merge. Also corrects the misleading "no-op" comment on the daily path. - /health/dependencies: cache the deep probe for 30s so the endpoint can't be used to hammer Glacier/Metrics and burn the rate budget the cron jobs rely on; treat an unconfigured base URL as degraded (not healthy), since those env vars are required. - Delete stale tracked duplicate src/services/teleporterService 2.js (old non-resumable logic, unreferenced). - Add a regression test proving a boundary-timestamp message is assigned to exactly one window. Full suite: 207 pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ponse
updateData fell off the retry loop and resolved to undefined when the Metrics
API kept returning a 200 with a non-array `results` payload (each attempt hit
the `continue`). updateAllChains then crashed on `undefined.success`. Return an
explicit { success:false } result instead, with a regression test.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
A batch of low-risk reliability and maintainability fixes for the backend. Net −1,624 lines (large dead-code/duplication removal) with +19 tests.
Full suite: 208 passing.
No behavior changes to public API responses — the metric refactor preserves every existing method name and the Teleporter weekly result is identical to
the old single-pass aggregation.
What's in here
1. Resumable Teleporter weekly update
The ~10h, 7-day ICM fetch previously ran as a single in-memory pass — a redeploy or crash mid-run lost all progress and restarted from scratch on the
next cron tick. It now fetches day-by-day, checkpointing each completed day to
partialResults(anchored to a storedreferenceEndTimeso resuminghours later still yields a consistent 7-day window). After a restart, the next run resumes from the next unfetched day. The existing
lock/heartbeat/ownership machinery is preserved.
2. Deduplicate metric services
Six near-identical ~307-line services (
activeAddresses,txCount,gasUsed,avgGasPrice,maxTps,feesPaid) collapsed into a singlecreateMetricServicefactory + thin wrappers (~1,370 fewer lines). Public method names are preserved, so no route/cron caller changed. Networksum-vs-avg aggregation is parameterized (only
avgGasPriceusesavg).3. Graceful shutdown + health checks
SIGTERM/SIGINTnow drain the HTTP server and close the Mongo connection, with a 10s force-exit backstop./health(liveness, always 200 + uptime/mongo info),/health/ready(readiness, 503 when Mongo is down),/health/dependencies(activeGlacier/Metrics probe, cached 30s so it can't be abused to burn the upstream rate budget).
4. Dead code & deployment cleanup
fetchGlacierData.js,chainDataService.js, and a stale trackedteleporterService 2.js.vercel.json, thevercel-buildscript, and theisVercelbranches; updated the README deployment docs.
Review fixes folded in
This branch also addresses findings from a self-review of the above:
[startTime, endTime)so a message on a shared 24h boundary is counted in exactly one window (nodouble-count).
metricService.updateDatareturns an explicit{ success: false }instead ofundefinedwhen the API returns a non-array payload on all retries(previously crashed
updateAllChains)./health/dependenciestreats an unconfigured base URL asdegradedrather than healthy.Testing
npm test).aren't refetched after a simulated day-4 crash), and the day-window boundary assignment.