Skip to content

Fix FP16 NEON build on AArch64 CPUs without FP16FML support#168

Open
Harish-endee wants to merge 1 commit intomasterfrom
harish/fix-fp16-compat
Open

Fix FP16 NEON build on AArch64 CPUs without FP16FML support#168
Harish-endee wants to merge 1 commit intomasterfrom
harish/fix-fp16-compat

Conversation

@Harish-endee
Copy link
Copy Markdown

@Harish-endee Harish-endee commented Apr 7, 2026

  1. FP16 NEON distance functions (L2Sqr, InnerProductSim) rely on vfmlalq_low_f16 and vfmlalq_high_f16 intrinsics. These intrinsics require the AArch64 FP16FML extension. Not all AArch64 CPUs support FP16FML, leading to compilation failures on unsupported targets.

  2. Added compatibility fallback implementations

  • Fallback uses universally supported NEON instructions: vcvt_f32_f16 and vfmaq_f32
  • Introduced compile-time macro dispatch to select the appropriate implementation
  • CPUs with FP16FML continue using the optimized single-instruction path with no overhead
  • Fallback code is only compiled when the FP16FML extension is unavailable

Some AArch64 CPUs don't support the FP16FML extension, which causes builds to fail due to missing vfmlalq_low_f16 and vfmlalq_high_f16 intrinsics. This adds compatibility fallbacks that use universally available NEON instructions (vcvt_f32_f16 + vfmaq_f32) instead, with automatic compile-time dispatch so CPUs with FP16FML still use the native single-instruction path.
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 7, 2026

VectorDB Benchmark - Ready To Run

CI Passed ([lint + unit tests] (https://github.com/endee-io/endee/actions/runs/24066965981)) - benchmark options unlocked.

Post one of the command below. Only members with write access can trigger runs.


Available Modes

Mode Command What runs
Dense /correctness_benchmarking dense HNSW insert throughput · query P50/P95/P99 · recall@10 · concurrent QPS
Hybrid /correctness_benchmarking hybrid Dense + sparse BM25 fusion · same suite + fusion latency overhead

Infrastructure

Server Role Instance
Endee Server Endee VectorDB — code from this branch t2.large
Benchmark Server Benchmark runner t3a.large

Both servers start on demand and are always terminated after the run — pass or fail.


How Correctness Benchmarking Works

1. Post /correctness_benchmarking <mode>
2. Endee Server Create  →  this branch's code deployed  →  Endee starts in chosen mode
3. Benchmark Server Create  →  benchmark suite transferred
4. Benchmark Server runs correctness benchmarking against Endee Server
5. Results posted back here  →  pass/fail + full metrics table
6. Both servers terminated   →  always, even on failure

After a new push, CI must pass again before this menu reappears.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants