References

Primary Papers

Olenik et al. 2024 — Towards a platform-portable linear algebra backend for OpenFOAM, Meccanica doi:10.1007/s11012-024-01806-1 → Defines the OGL design and KIT recommendation "2× MPI subdomains per GPU". We tested with ranksPerGPU 8 (single GPU, 8 ranks) per this guidance.
Tsai et al. 2023 — Providing performance portable numerics for Intel GPUs, Wiley CCPE doi:10.1002/cpe.7400 → Documents ParIC / ParILU / ParICT / ISAI work on DPC++. Earlier versions of this repo claimed a discrepancy with the paper — that was wrong. Per Ginkgo team feedback (issue #2013), ParIc/ParIlu factorization does work on SYCL. The gap we hit on Battlemage is on the apply side: lower_trs / upper_trs kernels are missing in dpcpp/solver/, and ParIct::add_candidates SIGABRTs. The classic Ic/Ilu (sparselib-based) is genuinely not in SYCL. See findings/05 for the corrected mapping.
Anzt et al. 2022 — Ginkgo: A Modern Linear Operator Algebra Framework for High Performance Computing, ACM TOMS doi:10.1145/3480935 → Architecture / executor model that OGL builds on.

OGL / Ginkgo Upstream

hpsim/OGL — OpenFOAM Ginkgo Layer (GPU plugin)
- findings/10 issue body (ready to file)
ginkgo-project/ginkgo
- findings/11 issue body (ready to file)
intel/compute-runtime
- Bug filing planned for findings/13 (resource_info abort with multi-rank OGL)

Related Battlemage Pioneer Work

PMZFX/intel-arc-pro-b70-benchmarks https://github.com/PMZFX/intel-arc-pro-b70-benchmarks → Independent B70 Pro pioneer for LLM inference. Upstreamed Q8_0 SYCL fix (PRs #21527 / #21638 in llama.cpp), achieving 3.1× speedup. → Validates our broader observation that Battlemage SYCL kernels need targeted fixes per workload — not a generic driver/compiler issue.
llama.cpp Issue #21517 ggml-org/llama.cpp#21517 → "Update from CR 26.05 to 26.09 did not improve performance — issue is in kernel code, not driver." Same pattern as our findings/13: driver updates alone do not solve the per-workload software-stack problems.

Phoronix Hardware Reviews

Intel Arc Pro B70 Linux Benchmarks (Phoronix) → Reference benchmarks on the same hardware for non-CFD workloads (rendering, video, ML inference). Useful for hardware sanity-check comparisons.

Related Hardware/Software Documentation

OGL/Ginkgo recommended fvSolution patterns → Source for SPD-preconditioner scaling -1.0 requirement we tested in findings/15.
Intel Compute Runtime release notes
Ginkgo release notes
oneAPI Base Toolkit notes

Hardware Diagnostic Run — 2026-05-10

Standalone cross-stack SpMV/CG diagnostic on Intel Arc Pro B70 (BMG-G31), Ubuntu 26.04 LTS, oneAPI 2025.3.3 / 2026.0, comparing oneMKL Sparse, PETSc aijkokkos, and Ginkgo dpcpp on an identical 1M-row Poisson 5-point reference matrix (4.996M nnz).

Method. Generator gen_matrix.cpp writes a 1000×1000 5-point Poisson matrix in MatrixMarket format. Three test harnesses load the matrix and run 1000 SpMV iterations after 10 warm-up calls. Timing brackets the inner loop only; CG-loop number includes vector ops + sync per iteration.

Hardware: Intel Arc Pro B70, 32 GB GDDR6, BMG-G31 (device 0xe223). Software: oneAPI 2025.3.3 for PETSc β5h2, oneAPI 2026.0 for Ginkgo (/opt/ginkgo linked against libsycl.so.9).

Results.

Stack	ms/iter	Effective BW
oneMKL Sparse CG (full loop)	0.741	161 GB/s
PETSc aijkokkos (pure SpMV)	0.287	418 GB/s (79 % Triad)
Ginkgo dpcpp (pure SpMV)	0.089	1340 GB/s*

* Cache-resident x (8 MB fits in B70 L2 ≈ 12 MB). Reported BW is arithmetic; physical peak is 608 GB/s.

Caveat. SpMV-only microbenchmark. The Ginkgo number reflects cache effects that shrink for larger systems. Diagnostic value: confirms B70 hardware functional for sparse linear algebra; the AMG wall in the sister repo is a software bug, not a hardware limitation.

Logs: logs/diag-2026-05-10/ (gzipped).

Cross-stack interpretation: see findings 23-26 (PETSc repo) and finding 23 (Ginkgo repo) for the symmetric write-up.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

References

Primary Papers

OGL / Ginkgo Upstream

Related Battlemage Pioneer Work

Phoronix Hardware Reviews

Related Hardware/Software Documentation

Hardware Diagnostic Run — 2026-05-10

FilesExpand file tree

references.md

Latest commit

History

references.md

File metadata and controls

References

Primary Papers

OGL / Ginkgo Upstream

Related Battlemage Pioneer Work

Phoronix Hardware Reviews

Related Hardware/Software Documentation

Hardware Diagnostic Run — 2026-05-10