Skip to content

Add NEON support for memset#408

Open
satya200 wants to merge 3 commits intodevelopfrom
topic/RDKEMW-14024
Open

Add NEON support for memset#408
satya200 wants to merge 3 commits intodevelopfrom
topic/RDKEMW-14024

Conversation

@satya200
Copy link
Copy Markdown

No description provided.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds an additional Yocto patch to glibc 2.35 intended to introduce a NEON-optimized memset implementation for ARM targets.

Changes:

  • Append a new glibc patch (memset_neon.patch) to the recipe’s SRC_URI.
  • Add a patch that replaces sysdeps/arm/memset.S with a NEON + prefetch-based implementation.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 7 comments.

File Description
recipes-core/glibc/glibc_2.35.bbappend Adds memset_neon.patch to glibc’s patch list.
recipes-core/glibc/files/memset_neon.patch Introduces a NEON-based memset implementation in ARM assembly.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.



SRC_URI += "${@bb.utils.contains('DISTRO_FEATURES', 'enable_heaptrack','file://size.patch','',d)} "
SRC_URI += " file://memset_neon.patch "
Comment on lines +43 to +45
+.syntax unified
+.fpu neon
+.text
- cmp r2, #8
- bcc 2f @ less than 8 bytes to move

+memset:
Comment on lines +99 to +115
+ @ 5. Final byte-by-byte copy for remainder
+.L_small_copy:
+ cmp r2, #0
+ beq .L_done
1:
- tst r3, #3 @ aligned yet?
- strbne r1, [r3], #1
- subne r2, r2, #1
- bne 1b
-
- and r1, r1, #255 @ clear any sign bits
- orr r1, r1, r1, lsl $8
- orr r1, r1, r1, lsl $16
- mov ip, r1
+ strb r1, [r3], #1
+ subs r2, r2, #1
+ bne 1b
Comment on lines +4 to +42
@@ -18,51 +18,103 @@
/* Thumb requires excessive IT insns here. */
#define NO_THUMB
#include <sysdep.h>
+#include <arm-features.h>

- .text
- .syntax unified
+/*
+ * Data preload for architectures that support it (ARM V5TE and above)
+ */
+#if (!defined (__ARM_ARCH_2__) && !defined (__ARM_ARCH_3__) \
+ && !defined (__ARM_ARCH_3M__) && !defined (__ARM_ARCH_4__) \
+ && !defined (__ARM_ARCH_4T__) && !defined (__ARM_ARCH_5__) \
+ && !defined (__ARM_ARCH_5T__))
+#define PLD(code...) code
+#else
+#define PLD(code...)
+#endif
+
+/*
+ * This can be used to enable code to cacheline align the source pointer.
+ * Experiments on tested architectures (StrongARM and XScale) didn't show
+ * this a worthwhile thing to do. That might be different in the future.
+ */
+//#define CALGN(code...) code
+#define CALGN(code...)
+
+/*
+ * Endian independent macros for shifting bytes within registers.
+ */
+#ifndef __ARMEB__
+#define PULL lsr
+#define PUSH lsl
+#else
+#define PULL lsl
+#define PUSH lsr
+#endif
+
- strbcs r1, [r3], #1
- bcs 2b
+.L_done:
+ bx lr @ Return original r0
Comment on lines +1 to +3
diff -Naur a/sysdeps/arm/memset.S b/sysdeps/arm/memset.S
--- a/sysdeps/arm/memset.S 2026-02-04 14:53:01.545550010 +0000
+++ b/sysdeps/arm/memset.S 2026-02-12 13:03:16.527540694 +0000
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants