Skip to content

HDFS-17780. IBR retry bypasses configured interval on RPC failure#8312

Open
balodesecurity wants to merge 3 commits intoapache:trunkfrom
balodesecurity:HDFS-17780
Open

HDFS-17780. IBR retry bypasses configured interval on RPC failure#8312
balodesecurity wants to merge 3 commits intoapache:trunkfrom
balodesecurity:HDFS-17780

Conversation

@balodesecurity
Copy link

Problem

When blockReceivedAndDeleted RPC fails in IncrementalBlockReportManager.sendIBRs(), the lastIBR timestamp was only updated on success. A failed RPC left lastIBR stale, so sendImmediately() returned true on every subsequent heartbeat — the DataNode retried the IBR on every heartbeat cycle instead of waiting for the configured dfs.blockreport.incremental.intervalMsec.

Under high NameNode load this creates a positive feedback loop: failures trigger immediate retries, which increase NameNode contention, which causes more failures.

Fix

Move lastIBR = startTime out of the if (success) branch into the unconditional finally block. Whether the RPC succeeds or fails, lastIBR is updated so the interval is always respected between attempts.

Testing

  • Added TestIncrementalBlockReports#testFailedIBRRespectsInterval: creates an IncrementalBlockReportManager with a 60-second interval, injects a pending block, mocks the NameNode to throw IOException, calls sendIBRs(), and asserts sendImmediately() returns false immediately after the failure.
  • Test passes locally.

When blockReceivedAndDeleted RPC fails, lastIBR was only updated on
success. This left lastIBR stale so sendImmediately() returned true on
every subsequent heartbeat, creating a retry storm that amplifies
NameNode contention instead of backing off.

Fix: update lastIBR = startTime unconditionally in the finally block,
regardless of whether the RPC succeeded or failed, so the configured
dfs.blockreport.incremental.intervalMsec is always respected between
attempts.

Adds TestIncrementalBlockReports#testFailedIBRRespectsInterval: a
pure-mock unit test that injects a failing NN RPC and asserts that
sendImmediately() returns false immediately after the failure.
@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 13m 2s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 41m 11s trunk passed
+1 💚 compile 1m 40s trunk passed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04
+1 💚 compile 1m 50s trunk passed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1
+1 💚 checkstyle 1m 51s trunk passed
+1 💚 mvnsite 1m 56s trunk passed
+1 💚 javadoc 1m 32s trunk passed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04
+1 💚 javadoc 1m 32s trunk passed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1
+1 💚 spotbugs 4m 11s trunk passed
+1 💚 shadedclient 31m 12s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 1m 19s the patch passed
+1 💚 compile 1m 15s the patch passed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04
+1 💚 javac 1m 15s the patch passed
+1 💚 compile 1m 16s the patch passed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1
+1 💚 javac 1m 16s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 1m 16s the patch passed
+1 💚 mvnsite 1m 26s the patch passed
+1 💚 javadoc 0m 58s the patch passed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04
+1 💚 javadoc 1m 1s the patch passed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1
+1 💚 spotbugs 3m 49s the patch passed
+1 💚 shadedclient 30m 14s patch has no errors when building and testing our client artifacts.
_ Other Tests _
-1 ❌ unit 346m 3s /patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt hadoop-hdfs in the patch passed.
+1 💚 asflicense 0m 46s The patch does not generate ASF License warnings.
487m 16s
Reason Tests
Failed junit tests hadoop.hdfs.server.balancer.TestBalancerWithHANameNodes
Subsystem Report/Notes
Docker ClientAPI=1.54 ServerAPI=1.54 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8312/1/artifact/out/Dockerfile
GITHUB PR #8312
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux a9fc4226d7bb 5.15.0-164-generic #174-Ubuntu SMP Fri Nov 14 20:25:16 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 6f2453f
Default Java Ubuntu-17.0.18+8-Ubuntu-124.04.1
Multi-JDK versions /usr/lib/jvm/java-21-openjdk-amd64:Ubuntu-21.0.10+7-Ubuntu-124.04 /usr/lib/jvm/java-17-openjdk-amd64:Ubuntu-17.0.18+8-Ubuntu-124.04.1
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8312/1/testReport/
Max. process+thread count 3478 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8312/1/console
versions git=2.43.0 maven=3.9.11 spotbugs=4.9.7
Powered by Apache Yetus 0.14.1 https://yetus.apache.org

This message was automatically generated.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 35s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 42m 30s trunk passed
+1 💚 compile 1m 45s trunk passed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04
+1 💚 compile 1m 42s trunk passed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1
+1 💚 checkstyle 1m 45s trunk passed
+1 💚 mvnsite 1m 49s trunk passed
+1 💚 javadoc 1m 23s trunk passed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04
+1 💚 javadoc 1m 24s trunk passed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1
+1 💚 spotbugs 4m 14s trunk passed
+1 💚 shadedclient 31m 41s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 1m 23s the patch passed
+1 💚 compile 1m 17s the patch passed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04
+1 💚 javac 1m 17s the patch passed
+1 💚 compile 1m 22s the patch passed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1
+1 💚 javac 1m 22s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 1m 21s the patch passed
+1 💚 mvnsite 1m 31s the patch passed
+1 💚 javadoc 0m 58s the patch passed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04
+1 💚 javadoc 1m 0s the patch passed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1
+1 💚 spotbugs 3m 59s the patch passed
+1 💚 shadedclient 32m 27s patch has no errors when building and testing our client artifacts.
_ Other Tests _
-1 ❌ unit 220m 46s /patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt hadoop-hdfs in the patch passed.
+1 💚 asflicense 0m 47s The patch does not generate ASF License warnings.
358m 42s
Reason Tests
Failed junit tests hadoop.hdfs.server.datanode.fsdataset.impl.TestFsVolumeList
Subsystem Report/Notes
Docker ClientAPI=1.54 ServerAPI=1.54 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8312/2/artifact/out/Dockerfile
GITHUB PR #8312
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux 56af1970f416 5.15.0-173-generic #183-Ubuntu SMP Fri Mar 6 13:29:34 UTC 2026 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 8de084e
Default Java Ubuntu-17.0.18+8-Ubuntu-124.04.1
Multi-JDK versions /usr/lib/jvm/java-21-openjdk-amd64:Ubuntu-21.0.10+7-Ubuntu-124.04 /usr/lib/jvm/java-17-openjdk-amd64:Ubuntu-17.0.18+8-Ubuntu-124.04.1
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8312/2/testReport/
Max. process+thread count 3476 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8312/2/console
versions git=2.43.0 maven=3.9.11 spotbugs=4.9.7
Powered by Apache Yetus 0.14.1 https://yetus.apache.org

This message was automatically generated.

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 43s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 40m 18s trunk passed
+1 💚 compile 1m 45s trunk passed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04
+1 💚 compile 1m 52s trunk passed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1
+1 💚 checkstyle 1m 48s trunk passed
+1 💚 mvnsite 1m 55s trunk passed
+1 💚 javadoc 1m 29s trunk passed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04
+1 💚 javadoc 1m 34s trunk passed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1
+1 💚 spotbugs 4m 9s trunk passed
+1 💚 shadedclient 31m 20s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 1m 19s the patch passed
+1 💚 compile 1m 15s the patch passed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04
+1 💚 javac 1m 15s the patch passed
+1 💚 compile 1m 16s the patch passed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1
+1 💚 javac 1m 16s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 1m 15s the patch passed
+1 💚 mvnsite 1m 25s the patch passed
+1 💚 javadoc 0m 58s the patch passed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04
+1 💚 javadoc 1m 1s the patch passed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1
+1 💚 spotbugs 3m 48s the patch passed
+1 💚 shadedclient 29m 58s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 218m 9s hadoop-hdfs in the patch passed.
+1 💚 asflicense 0m 49s The patch does not generate ASF License warnings.
346m 6s
Subsystem Report/Notes
Docker ClientAPI=1.54 ServerAPI=1.54 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8312/3/artifact/out/Dockerfile
GITHUB PR #8312
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux 35312e0b3ae5 5.15.0-173-generic #183-Ubuntu SMP Fri Mar 6 13:29:34 UTC 2026 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / ee86421
Default Java Ubuntu-17.0.18+8-Ubuntu-124.04.1
Multi-JDK versions /usr/lib/jvm/java-21-openjdk-amd64:Ubuntu-21.0.10+7-Ubuntu-124.04 /usr/lib/jvm/java-17-openjdk-amd64:Ubuntu-17.0.18+8-Ubuntu-124.04.1
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8312/3/testReport/
Max. process+thread count 3382 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8312/3/console
versions git=2.43.0 maven=3.9.11 spotbugs=4.9.7
Powered by Apache Yetus 0.14.1 https://yetus.apache.org

This message was automatically generated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants