Skip to content

Optimize inner loop with SIMD instructions#1

Open
bccbrendan wants to merge 5 commits intoarm-education:mainfrom
bccbrendan:simd-instructions
Open

Optimize inner loop with SIMD instructions#1
bccbrendan wants to merge 5 commits intoarm-education:mainfrom
bccbrendan:simd-instructions

Conversation

@bccbrendan
Copy link

@bccbrendan bccbrendan commented Mar 2, 2026

This is meant as an additional branch to showcase performance optimization from Arm Neon SIMD instructions.

On my Arm laptop I see a speedup when running time ./builds/mandelbrot-parallel 1 (single-threaded)
from ~19s to ~4.9 seconds, which is roughly a 4x speedup I would expect from running 4 floating point operations at a time.

When running with 8 parallel threads, the SIMD speedup goes from 2.9s to 0.78s.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant