Add Intel Xeon support (SPR+) for RAG by tpawlows · Pull Request #126 · rh-ai-quickstart/RAG

tpawlows · 2026-01-05T09:03:30Z

This PR extends the quickstart with an option to deploy RAG on Intel Xeon for balanced price/performance.
- It is similar to the CPU deployment but uses a container image with an optimized vLLM for Xeon that leverages AVX512 and AMX instruction extensions for improved inference (opea/vllm-cpu-ubi:v0.12.0-ubi9).
By default, requires OpenShift cluster with at least one worker node that is using SPR or newer generation CPU with more than 16vCPU and 64GiB of memory, e.g. m8i.8xlarge (32vCPU128GiB SPR), m8i.8xlarge ( 32vCPU128GiB GNR)
Validated models:
- meta-llama/Llama-3.2-3B-Instruct
- meta-llama/Llama-3.1-8B-Instruct
Added example Xeon configuration to deploy/helm/rag/values.yaml
Updated README.md

To deploy use flag DEVICE=xeon:

# Xeon deployment
make install NAMESPACE=llama-stack-rag LLM=llama-3-2-3b-instruct SAFETY=llama-guard-3-8b DEVICE=xeon

Succesfully deployed and tested with Add Xeon deployment support to llm-service Helm chart ai-architecture-charts#137

jharmison-redhat

I don't have strong feelings on my docs recommendation here, but it would be nice to consider if we're going to start splitting them out like this.

tpawlows added 4 commits December 22, 2025 11:11

Add example config of llama-3-2-3b-instruct for Xeon deployment

706d023

Add Xeon section to README.md and update values min requirements

b752d00

Minor README update

a1e105c

add llama-3-1-8b-instruct example for xeon

6692abc

jharmison-redhat approved these changes Jan 8, 2026

View reviewed changes

Comment thread README.md Outdated

Split HW section in supported models table, add N/A for HPU

70cd485

tpawlows mentioned this pull request Jan 12, 2026

Add Intel Xeon support (SPR+) for RAG [dev] #127

Merged

tpawlows closed this Jan 15, 2026

Provide feedback