Skip to content

Conversation

@minnerbe
Copy link
Contributor

While investigating a performance issue (which turned out to be unrelated to this library) in some downstream code, I found that some minimal changes could provide significant speedup: Integral images are accumulated row-wise and column-wise, where column-wise accumulation took significantly longer than the row-wise one.

This PR changes the memory access pattern of the column-wise accumulation to row-wise access, thereby significantly reducing the run time; see the changes in IntIntegralImage as an example. These are the min/max run times for 10 runs of computing the integral image for a 20000x20000 image before and after the changes:

Before After
Row-wise 214ms - 267ms 222ms - 269ms
Column-wise 627ms - 716ms 131ms - 141ms
Total 841ms - 983ms 353ms - 410ms

All changes either change the accumulation in exactly the same manner, or are IDE warnings which are automatically fixed. For the former category of changes, I manually checked correctness by using images with elements i -> 1 and i -> i + 1.

Let me know what you think, @axtimwalde

@axtimwalde
Copy link
Owner

Beautiful! Have you benchmarked small images? 1024^2 or something?

@minnerbe
Copy link
Contributor Author

minnerbe commented May 27, 2025

Good point @axtimwalde ! Our use case was large images (slices of compute blocks of large 3D stacks), so I focused on that.

I rerun a small benchmark for n x n images, where n varied between 10000-100 and I took the average run time over 100-10000 runs. As expected, the difference is not as stark for medium-sized images but still noticeable, and at least there's no degradation in run time for small-sized images.

Size Before After
10000 240ms 97ms
1000 1.23ms 0.97ms
100 0.012ms 0.011ms

I haven't looked at the assembly instructions generated by the JIT compiler, but I suspect that all of this is 'just' more efficient memory access, since Java doesn't reliably compile SIMD instructions. Should this change in the future, the current way of doing column-wise summation is very amenable to this kind of automatic further optimizations.

@axtimwalde axtimwalde merged commit c452d05 into axtimwalde:master May 27, 2025
1 check passed
@axtimwalde
Copy link
Owner

Thanks a lot!

@minnerbe minnerbe deleted the refactor/integral-images branch May 27, 2025 15:23
@minnerbe minnerbe mentioned this pull request Oct 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants