Skip to content

Fix benchmarks and gpu crash#20

Open
bgreni wants to merge 2 commits into
BradLarson:mainfrom
bgreni:improve-gpu-performance
Open

Fix benchmarks and gpu crash#20
bgreni wants to merge 2 commits into
BradLarson:mainfrom
bgreni:improve-gpu-performance

Conversation

@bgreni

@bgreni bgreni commented Feb 11, 2026

Copy link
Copy Markdown
Contributor

Fix crash caused by converted_intensity being ref captured and color tensor begin read by thee cpu now before gpu kernel execution.

Also apparently foreach is async now on gpu (or maybe just because I was only running them on mac until now?) so explicit sync is required. Also stride value in bench tensor spec was wrong.

@bgreni bgreni force-pushed the improve-gpu-performance branch from 5aa1a5a to 622335c Compare February 12, 2026 16:36
@bgreni bgreni marked this pull request as draft February 12, 2026 16:36
@bgreni

bgreni commented Feb 12, 2026

Copy link
Copy Markdown
Contributor Author

Putting this into a draft as I am trying to write a gpu kernel for the sobel operation and it is not going well so far

@bgreni bgreni force-pushed the improve-gpu-performance branch 2 times, most recently from 04663a3 to 390badf Compare February 12, 2026 23:55
@bgreni

bgreni commented Feb 12, 2026

Copy link
Copy Markdown
Contributor Author

Looks like my gpu implementation is now correct, but actually still a bit slower than the foreach variant.

@bgreni bgreni force-pushed the improve-gpu-performance branch 2 times, most recently from a6570cd to a884c33 Compare February 13, 2026 02:35
@bgreni bgreni marked this pull request as ready for review February 13, 2026 02:35
@bgreni

bgreni commented Feb 13, 2026

Copy link
Copy Markdown
Contributor Author

looks like simdifying things got me a little bit across the finish line on my rtx 3080. Performance difference on my mac M3 pro seems negligible.

@bgreni bgreni force-pushed the improve-gpu-performance branch from a884c33 to 22fdd20 Compare February 13, 2026 18:28
Add better gpu kernel for sobel oeprator
@bgreni bgreni force-pushed the improve-gpu-performance branch from 22fdd20 to b94ec59 Compare April 1, 2026 16:57
@bgreni bgreni force-pushed the improve-gpu-performance branch from 281cce9 to 0b82e8f Compare April 5, 2026 19:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant