Skip to content

fix: flaky eth/62 test#12096

Draft
alexb5dh wants to merge 1 commit into
masterfrom
fix/eth62-flaky-test
Draft

fix: flaky eth/62 test#12096
alexb5dh wants to merge 1 commit into
masterfrom
fix/eth62-flaky-test

Conversation

@alexb5dh

@alexb5dh alexb5dh commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Changes

  • Fixes flaky should_send_single_transaction_even_if_exceed_MaxPacketSize
    The test assumed all 512 transactions had the same serialized size, so it predicted how many would fit per message by measuring only the first one. This was wrong as each tx is signed with a different random key, and ECDSA signatures vary in length by a byte or two (leading zeros get stripped). Test was failing approximately once per 256 runs.
    Assertions are changed a bit, but total count and packet size validation are included.

Types of changes

What types of changes does your code introduce?

  • Bugfix (a non-breaking change that fixes an issue)

Testing

Requires testing

  • No

@claude

claude Bot commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Claude finished @alexb5dh's task in 3m 45s —— View job


Code Review

  • Gather context and understand the request
  • Read changed files and analyze the fix
  • Review for correctness, edge cases, and potential regressions
  • Post findings

Root cause analysis: Each transaction is signed with a fresh random private key (Build.A.PrivateKey.TestObject). ECDSA signatures are DER-encoded, and leading zero bytes in the r and s values are stripped — so each signature is 70–72 bytes depending on whether the high bit of r or s is set (probability ~50% per value). The old code predicted the batch boundary from txs[0].GetLength(), but that sample size could differ by ±1–2 bytes from any other tx, shifting how many fit per packet. With 512 random-keyed txs the probability of one boundary shift is roughly 1/256 — matching the reported failure rate.


The fix

The rewrite correctly trades exact-count prediction for semantic-invariant verification:

// Before: brittle — one tx's size as oracle for all 512
int sizeOfOneTx = txs[0].GetLength();
int numberOfTxsInOneMsg = Math.Max(TransactionsMessage.MaxPacketSize / sizeOfOneTx, 1);
// ... predict messagesCount, assert exact counts match

// After: robust — verify the contract, not the implementation detail
Assert.That(deliveries.Sum(d => d.Count), Is.EqualTo(txCount));          // no tx lost or doubled
Assert.That(deliveries, Is.All.Matches(
    d => d.Count == 1 || d.SerializedSize <= TransactionsMessage.MaxPacketSize)); // packing invariant

This is the right approach. The semantic contract is clear:

  1. Every transaction is delivered exactly once.
  2. Multi-tx batches must fit within MaxPacketSize; single oversized txs are permitted to exceed it (this is the intended behaviour being tested).

The CountdownEvent delivered = new(txCount) + Signal(msg.Transactions.Count) pattern correctly waits until all 512 txs have been accounted for regardless of how they are batched. lock (deliveries) is a good defensive measure in case delivery is ever made asynchronous. Assert.EnterMultipleScope() ensures both invariants are reported on a failure, not just the first.


Low — two adjacent tests retain the same class of fragility (pre-existing, not introduced here)

should_send_txs_with_size_up_to_MaxPacketSize_in_one_TransactionsMessage (line 693–695, 701–703):

txs[i] = Build.A.Transaction.SignedAndResolved(Build.A.PrivateKey.TestObject).TestObject;
// ...
_session.Received(1).DeliverMessage(Arg.Is<TransactionsMessage>(m => m.Transactions.Count == txCount));

The [TestCase(1055, true)] case asserts that exactly 1055 randomly-keyed transactions land in exactly one message. If enough of those random signatures happen to be 1 byte longer, the total could exceed MaxPacketSize and split into two messages. The probability per run is small (requires many concurrent adverse draws), but over thousands of CI runs this will eventually fire.

should_send_txs_with_size_exceeding_MaxPacketSize_in_more_than_one_TransactionsMessage (lines 718–719, 728):

int sizeOfOneTestTransaction = Build.A.Transaction.SignedAndResolved().TestObject.GetLength(); // deterministic key
// ...
txs[i] = Build.A.Transaction.SignedAndResolved(Build.A.PrivateKey.TestObject).TestObject;   // random keys

The messagesCount oracle is derived from a single tx with a (presumably) deterministic key, while the actual txs use random keys. Same root cause as the bug being fixed here. Fix this →

These are pre-existing and out of scope for this PR, but worth a follow-up.


Summary

The fix is correct and well-targeted. The new test logic is robust, readable, and correctly captures the production invariant. No issues that block merge.

0 Critical · 0 High · 0 Medium · 1 Low (pre-existing, not introduced here)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant