Optimize inner loop with SIMD instructions by bccbrendan · Pull Request #1 · arm-education/Mandelbrot-Example

bccbrendan · 2026-03-02T20:53:48Z

This is meant as an additional branch to showcase performance optimization from Arm Neon SIMD instructions.

On my Arm laptop I see a speedup when running time ./builds/mandelbrot-parallel 1 (single-threaded)
from ~19s to ~4.9 seconds, which is roughly a 4x speedup I would expect from running 4 floating point operations at a time.

When running with 8 parallel threads, the SIMD speedup goes from 2.9s to 0.78s.

bccbrendan added 5 commits March 2, 2026 14:42

Add default thread count CLI arg of 1

3d265b9

contain #pragma pack settings to their own file

f1238f7

Use Arm Neon operations to handle 4 floating point operations at a time.

b6aaa9f

Fix loop vectorization #endif logic

3f1557e

add build.sh with optimizing compiler flags

304bb39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize inner loop with SIMD instructions#1

Optimize inner loop with SIMD instructions#1
bccbrendan wants to merge 5 commits intoarm-education:mainfrom
bccbrendan:simd-instructions

bccbrendan commented Mar 2, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bccbrendan commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

bccbrendan commented Mar 2, 2026 •

edited

Loading