Skip to content

Add Intel Xeon support (SPR+) for RAG [dev]#127

Merged
yuvalturg merged 5 commits into
rh-ai-quickstart:devfrom
tpawlows:dev
Jan 12, 2026
Merged

Add Intel Xeon support (SPR+) for RAG [dev]#127
yuvalturg merged 5 commits into
rh-ai-quickstart:devfrom
tpawlows:dev

Conversation

@tpawlows
Copy link
Copy Markdown
Contributor

Same as #126, but for dev branch.

Please merge it after rh-ai-quickstart/ai-architecture-charts#137 is merged. - merged

  • This PR extends the quickstart with an option to deploy RAG on Intel Xeon for balanced price/performance.
    • It is similar to the CPU deployment but uses a container image with an optimized vLLM for Xeon that leverages AVX512 and AMX instruction extensions for improved inference (opea/vllm-cpu-ubi:v0.12.0-ubi9).
  • By default, requires OpenShift cluster with at least one worker node that is using SPR or newer generation CPU with more than 16vCPU and 64GiB of memory, e.g. m8i.8xlarge (32vCPU128GiB SPR), m8i.8xlarge ( 32vCPU128GiB GNR)
  • Validated models:
    • meta-llama/Llama-3.2-3B-Instruct
    • meta-llama/Llama-3.1-8B-Instruct
  • Added example Xeon configuration to deploy/helm/rag/values.yaml
  • Updated README.md
  • To deploy use flag DEVICE=xeon:
    # Xeon deployment
    make install NAMESPACE=llama-stack-rag LLM=llama-3-2-3b-instruct SAFETY=llama-guard-3-8b DEVICE=xeon
  • Succesfully deployed and tested with Add Xeon deployment support to llm-service Helm chart ai-architecture-charts#137

@yuvalturg yuvalturg merged commit 16e8639 into rh-ai-quickstart:dev Jan 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants