You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Implement the layered test suite specified in the research findings for issues #4 (reputation-adjusted selection weight) and #5 (intra-epoch circuit breaker + probe recovery). Core implementation is already merged — this task is about coverage gaps: missing unit tests for edge cases and wiring, and new Testermint integration test classes.
Reference: #11 (full test specifications in comments)
Implementation Checklist
Phase 1: Missing unit tests — epochgroup/ (~0.5 days)
TestCalculateSelectionWeight_ExactFloorEquality in epoch_group_test.go — reputation=1 → adjusted=floor, verify no double-floor
TestCalculateSelectionWeight_3NodeDistribution in epoch_group_test.go — 3 nodes (rep 100/50/0), run 200k draws, assert ratios within ±2% tolerance
TestBuildSelectionWeightsMap_NilEntry — nil ValidationWeight entry in slice is skipped gracefully
Phase 2: Missing unit tests — keeper/circuit_breaker_test.go (~0.5 days)
TestHealthFilter_ExactMissThreshold — 3 hits / 1 miss = exactly 25% must NOT trigger exclusion (strict >, not >=)
TestHealthFilter_InferenceCountAndMissedRequestsSumToTotal — verify total = InferenceCount + MissedRequests used in filter (not just InferenceCount)
TestHealthFilter_NilParticipantDefaultInclude — participant not found in store → default include (safety fallback)
TestHealthFilter_ProbeStateIgnoresMissRate — node in PROBE state included even with 100% miss rate in stats
Phase 3: Feedback hook wiring tests (~1 day)
TestFinishInference_ProbeCBTransitionsToHealthy in msg_server_finish_inference_test.go — set executor to PROBE state, submit MsgFinishInference, assert CB entry deleted (→ defaults healthy)
TestExpiredInference_MissedRequestsIncrement in endblock_test.go or similar — call inference expiry handler with a HEALTHY executor, verify MissedRequests is incremented AND RecordCBResult(false) is a no-op (healthy stays healthy)
TestExpiredInference_ProbeCBTransitionsToExcluded — expire an inference where executor is in PROBE state, verify CB transitions to EXCLUDED with doubled cooldown
fast exclusion - node stops receiving traffic after min_samples misses — mock join node to return invalid JSON, run inferences until excluded (4+ misses), verify subsequent requests go to genesis only
all-degraded fallback - network keeps running when all nodes excluded — mock ALL nodes to fail, accumulate exclusions for all, verify GetRandomExecutor still returns a result (fallback to full list), verify no crash
probe recovery - excluded node recovers after cooldown — exclude a node, wait 50+ blocks (DefaultCBInitialCooldownBlocks), verify exactly one probe inference is sent to that node, restore mock, verify normal traffic resumes
high reputation node receives more executor traffic — set up two nodes with known reputation difference via EpochsCompleted (multi-epoch run), run 200 requests, assert ratio is within expected ±15% of theoretical
zero reputation node receives floor traffic — fresh participant (rep=0) vs established node (rep=100), run 1000 requests, assert new node gets at least 0.5% of traffic
Dependencies & Blockers
CB params (DefaultCBMissThresholdPct, DefaultCBMinSamples, DefaultCBInitialCooldownBlocks, DefaultCBMaxCooldownBlocks) are currently Go constants (not proto params). Phases 5 scenarios C/D require 50-block waits unless these are promoted to ValidationParams proto. Consider filing a follow-up to promote them, or adjust test design to use node.waitForMinimumBlock(currentBlock + 55).
Testermint CLI command for CB state query needs to match whatever endpoint the chain exposes — verify inferenced query inference circuit-breaker-state {address} exists or find the actual route.
Phase 6 reputation tests depend on being able to run multiple epochs in a test or manipulate EpochsCompleted — check if genesis spec can pre-load reputation via epochsCompleted field.
From research #11
Overview
Implement the layered test suite specified in the research findings for issues #4 (reputation-adjusted selection weight) and #5 (intra-epoch circuit breaker + probe recovery). Core implementation is already merged — this task is about coverage gaps: missing unit tests for edge cases and wiring, and new Testermint integration test classes.
Reference: #11 (full test specifications in comments)
Implementation Checklist
Phase 1: Missing unit tests —
epochgroup/(~0.5 days)TestCalculateSelectionWeight_ExactFloorEqualityinepoch_group_test.go— reputation=1 → adjusted=floor, verify no double-floorTestCalculateSelectionWeight_3NodeDistributioninepoch_group_test.go— 3 nodes (rep 100/50/0), run 200k draws, assert ratios within ±2% toleranceTestBuildSelectionWeightsMap_NilEntry— nil ValidationWeight entry in slice is skipped gracefullyPhase 2: Missing unit tests —
keeper/circuit_breaker_test.go(~0.5 days)TestHealthFilter_ExactMissThreshold— 3 hits / 1 miss = exactly 25% must NOT trigger exclusion (strict>, not>=)TestHealthFilter_InferenceCountAndMissedRequestsSumToTotal— verifytotal = InferenceCount + MissedRequestsused in filter (not just InferenceCount)TestHealthFilter_NilParticipantDefaultInclude— participant not found in store → default include (safety fallback)TestHealthFilter_ProbeStateIgnoresMissRate— node in PROBE state included even with 100% miss rate in statsPhase 3: Feedback hook wiring tests (~1 day)
TestFinishInference_ProbeCBTransitionsToHealthyinmsg_server_finish_inference_test.go— set executor to PROBE state, submitMsgFinishInference, assert CB entry deleted (→ defaults healthy)TestExpiredInference_MissedRequestsIncrementinendblock_test.goor similar — call inference expiry handler with a HEALTHY executor, verifyMissedRequestsis incremented ANDRecordCBResult(false)is a no-op (healthy stays healthy)TestExpiredInference_ProbeCBTransitionsToExcluded— expire an inference where executor is in PROBE state, verify CB transitions to EXCLUDED with doubled cooldownPhase 4: Testermint client — add CB state query (~0.5 days)
getCBState(address: String): CircuitBreakerEntrytoApplicationCLI.ktusinginferenced query inference circuit-breaker-state {address}CLIdata class CircuitBreakerEntry(state, excludedAtBlock, cooldownBlocks, probeAttempts)todata/packageassertCBState(pair, address, expectedState)toTestUtils.ktPhase 5: Testermint integration tests —
CircuitBreakerTests.kt(new file, ~1.5 days)fast exclusion - node stops receiving traffic after min_samples misses— mock join node to return invalid JSON, run inferences until excluded (4+ misses), verify subsequent requests go to genesis onlyall-degraded fallback - network keeps running when all nodes excluded— mock ALL nodes to fail, accumulate exclusions for all, verifyGetRandomExecutorstill returns a result (fallback to full list), verify no crashprobe recovery - excluded node recovers after cooldown— exclude a node, wait 50+ blocks (DefaultCBInitialCooldownBlocks), verify exactly one probe inference is sent to that node, restore mock, verify normal traffic resumesexponential backoff - cooldown doubles on repeated probe failures— fail 3 consecutive probes, verify cooldown sequence: 50 → 100 → 200 blocks (capped at 500)epoch boundary clears CB state— exclude a node, trigger epoch transition, verify CB state is cleared (node receives traffic again)Phase 6: Testermint integration tests —
ReputationWeightTests.kt(new file, ~1 day)high reputation node receives more executor traffic— set up two nodes with known reputation difference via EpochsCompleted (multi-epoch run), run 200 requests, assert ratio is within expected ±15% of theoreticalzero reputation node receives floor traffic— fresh participant (rep=0) vs established node (rep=100), run 1000 requests, assert new node gets at least 0.5% of trafficDependencies & Blockers
DefaultCBMissThresholdPct,DefaultCBMinSamples,DefaultCBInitialCooldownBlocks,DefaultCBMaxCooldownBlocks) are currently Go constants (not proto params). Phases 5 scenarios C/D require 50-block waits unless these are promoted toValidationParamsproto. Consider filing a follow-up to promote them, or adjust test design to usenode.waitForMinimumBlock(currentBlock + 55).inferenced query inference circuit-breaker-state {address}exists or find the actual route.epochsCompletedfield.Estimated Total: 4–5 days