fix(daemon): keep tier-2 inference off the fast-path SDK round-trip#14
Merged
Conversation
- /v1/check no longer awaits the classifier on actions the rules already allowed or rejected. The verdict returns in microseconds and the score lands in the action_ledger via a fire-and-forget backfill once inference completes. - The detached classify() call is deferred with setTimeout(fn, 0) so any synchronous work the classifier does on its first tick (tokenizer warm-up, IPC encode) runs after the response is sent, not on the SDK round-trip. - Concurrency cap on detached classifier work prevents a runaway sidecar from queueing thousands of in-flight inferences. New scoring is dropped when the cap is hit; the verdict still lands. - backfillTier2 logs a warning if the row id no longer exists. finalize already throws on missing rows; the asymmetry is intentional because backfill is best-effort and detached. - A rules-allowed action that the classifier scores as malicious now logs the disagreement so it becomes review and training signal. The classifier remains advisory; no verdict change. - Paused-path behavior is unchanged: the modal still gets the score in real time, persisted to the pending row before the WebSocket fan-out. - New backfillTier2(rowId, score) ledger helper and a tier2Cols helper that collapses three open-coded copies of the nullable-tier2 spread. - appendResolved now returns the row id so the fast-path callsite can chain backfillTier2 without a second query. - New integration tests at src/daemon/check.test.ts drive the Hono app via app.fetch and assert: the response latency is independent of classifier delay, the backfill lands after the response, the paused path still surfaces the score on the modal and the SDK response, the throwing-after-return classifier doesn't break the request, and a flood of 80 concurrent fast-path requests survives.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
fix(daemon): keep tier-2 inference off the fast-path SDK round-trip
already allowed or rejected. The verdict returns in microseconds
and the score lands in the action_ledger via a fire-and-forget
backfill once inference completes.
so any synchronous work the classifier does on its first tick
(tokenizer warm-up, IPC encode) runs after the response is sent,
not on the SDK round-trip.
sidecar from queueing thousands of in-flight inferences. New
scoring is dropped when the cap is hit; the verdict still lands.
finalize already throws on missing rows; the asymmetry is
intentional because backfill is best-effort and detached.
now logs the disagreement so it becomes review and training
signal. The classifier remains advisory; no verdict change.
score in real time, persisted to the pending row before the
WebSocket fan-out.
helper that collapses three open-coded copies of the
nullable-tier2 spread.
can chain backfillTier2 without a second query.
app via app.fetch and assert: the response latency is independent
of classifier delay, the backfill lands after the response, the
paused path still surfaces the score on the modal and the SDK
response, the throwing-after-return classifier doesn't break the
request, and a flood of 80 concurrent fast-path requests survives.