Skip to content

HADOOP-19839. PureJavaCrc32/Crc32C delegate to JDK CRC32/CRC32C#8349

Open
pan3793 wants to merge 2 commits intoapache:trunkfrom
pan3793:HADOOP-19839
Open

HADOOP-19839. PureJavaCrc32/Crc32C delegate to JDK CRC32/CRC32C#8349
pan3793 wants to merge 2 commits intoapache:trunkfrom
pan3793:HADOOP-19839

Conversation

@pan3793
Copy link
Member

@pan3793 pan3793 commented Mar 18, 2026

Description of PR

PureJavaCrc32 and PureJavaCrc32C are used as an alternative for lower JDKs, for both functionality and performance purposes, but they are not good reasons on modern JDKs.

While they are public APIs, to avoid breaking changes, we keep them, but delegate the implementation to the JDK.

How was this patch tested?

Existing tests for compatibility and correctness.

Also, run the benchmark for performance - this patch makes PureJavaCrc32/Crc32C nearly identical to the JDK implementations. For trunk:

Here are the results (Java 17.0.15, OpenJDK 64-Bit, amd64, Linux):


TestPureJavaCrc32 — CRC32 vs PureJavaCrc32 (MB/sec)

At small sizes (≤64 bytes), PureJavaCrc32 is actually faster than java.util.zip.CRC32 (likely due to JNI call overhead). Above 128 bytes, CRC32 (hardware-accelerated via JNI) dominates significantly, reaching ~25 GB/sec at large sizes, while PureJavaCrc32 plateaus around ~2.2 GB/sec — roughly 85–95% slower.


TestPureJavaCrc32C — CRC32C vs PureJavaCrc32C (MB/sec)

java.util.zip.CRC32C (Java 9+ hardware CRC32C using SSE4.2) is consistently faster across all sizes. It peaks around ~30 GB/sec, while PureJavaCrc32C stays around ~2.1 GB/sec — roughly 82–97% slower.

Key observations at 1 thread, large buffers (≥1MB):

┌────────────────┬─────────────────┐
│ Implementation │ Peak throughput │
├────────────────┼─────────────────┤
│ CRC32 (JDK)    │ ~24 GB/sec      │
├────────────────┼─────────────────┤
│ PureJavaCrc32  │ ~2.3 GB/sec     │
├────────────────┼─────────────────┤
│ CRC32C (JDK)   │ ~30 GB/sec      │
├────────────────┼─────────────────┤
│ PureJavaCrc32C │ ~2.1 GB/sec     │
└────────────────┴─────────────────┘

The pure Java implementations are consistent with each other (~2 GB/sec), while both JDK implementations leverage hardware acceleration for dramatically better performance. CRC32C has an additional advantage over CRC32 at typical buffer sizes (256B–4MB) due to more efficient SSE4.2 carryless multiplication.

For code changes:

  • Does the title or this PR starts with the corresponding JIRA issue id (HADOOP-19839)?
  • Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE, LICENSE-binary, NOTICE-binary files?

AI Tooling

Contains content generated by: Claude Code Sonnet 4.6.

AI is used to summarize the benchmark report.

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 19m 15s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 2 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 1m 58s Maven dependency ordering for branch
+1 💚 mvninstall 52m 16s trunk passed
+1 💚 compile 17m 33s trunk passed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04
+1 💚 compile 18m 8s trunk passed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1
+1 💚 checkstyle 5m 56s trunk passed
+1 💚 mvnsite 4m 0s trunk passed
+1 💚 javadoc 3m 9s trunk passed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04
+1 💚 javadoc 3m 5s trunk passed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1
+1 💚 spotbugs 6m 19s trunk passed
+1 💚 shadedclient 33m 46s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 31s Maven dependency ordering for patch
+1 💚 mvninstall 2m 25s the patch passed
+1 💚 compile 17m 16s the patch passed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04
+1 💚 javac 17m 16s the patch passed
+1 💚 compile 17m 59s the patch passed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1
+1 💚 javac 17m 59s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 5m 52s /results-checkstyle-root.txt root: The patch generated 8 new + 1196 unchanged - 31 fixed = 1204 total (was 1227)
+1 💚 mvnsite 4m 1s the patch passed
+1 💚 javadoc 3m 4s the patch passed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04
+1 💚 javadoc 3m 2s the patch passed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1
+1 💚 spotbugs 6m 50s the patch passed
+1 💚 shadedclient 34m 32s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 22m 41s hadoop-common in the patch passed.
+1 💚 unit 9m 1s hadoop-mapreduce-client-core in the patch passed.
+1 💚 unit 1m 14s hadoop-mapreduce-examples in the patch passed.
+1 💚 asflicense 1m 14s The patch does not generate ASF License warnings.
303m 56s
Subsystem Report/Notes
Docker ClientAPI=1.54 ServerAPI=1.54 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8349/1/artifact/out/Dockerfile
GITHUB PR #8349
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux 31c3be4d1098 5.15.0-173-generic #183-Ubuntu SMP Fri Mar 6 13:29:34 UTC 2026 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / a15efe0
Default Java Ubuntu-17.0.18+8-Ubuntu-124.04.1
Multi-JDK versions /usr/lib/jvm/java-21-openjdk-amd64:Ubuntu-21.0.10+7-Ubuntu-124.04 /usr/lib/jvm/java-17-openjdk-amd64:Ubuntu-17.0.18+8-Ubuntu-124.04.1
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8349/1/testReport/
Max. process+thread count 3146 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-examples U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8349/1/console
versions git=2.43.0 maven=3.9.11 spotbugs=4.9.7
Powered by Apache Yetus 0.14.1 https://yetus.apache.org

This message was automatically generated.

* </code></pre>
* The output is in JIRA table format.
*/
public static class PerformanceTest {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checkstyle might complain about this, but it is mostly duplicated from the existing TestPureJavaCrc32$PerformanceTest

@pan3793
Copy link
Member Author

pan3793 commented Mar 18, 2026

cc @aajisaka @steveloughran, could you please take a look?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants