Skip to content

Add Intel Xeon support (SPR+) for RAG#126

Closed
tpawlows wants to merge 5 commits into
rh-ai-quickstart:mainfrom
tpawlows:main
Closed

Add Intel Xeon support (SPR+) for RAG#126
tpawlows wants to merge 5 commits into
rh-ai-quickstart:mainfrom
tpawlows:main

Conversation

@tpawlows
Copy link
Copy Markdown
Contributor

@tpawlows tpawlows commented Jan 5, 2026

Please merge it after rh-ai-quickstart/ai-architecture-charts#137 is merged.

  • This PR extends the quickstart with an option to deploy RAG on Intel Xeon for balanced price/performance.
    • It is similar to the CPU deployment but uses a container image with an optimized vLLM for Xeon that leverages AVX512 and AMX instruction extensions for improved inference (opea/vllm-cpu-ubi:v0.12.0-ubi9).
  • By default, requires OpenShift cluster with at least one worker node that is using SPR or newer generation CPU with more than 16vCPU and 64GiB of memory, e.g. m8i.8xlarge (32vCPU128GiB SPR), m8i.8xlarge ( 32vCPU128GiB GNR)
  • Validated models:
    • meta-llama/Llama-3.2-3B-Instruct
    • meta-llama/Llama-3.1-8B-Instruct
  • Added example Xeon configuration to deploy/helm/rag/values.yaml
  • Updated README.md
  • To deploy use flag DEVICE=xeon:
    # Xeon deployment
    make install NAMESPACE=llama-stack-rag LLM=llama-3-2-3b-instruct SAFETY=llama-guard-3-8b DEVICE=xeon
  • Succesfully deployed and tested with Add Xeon deployment support to llm-service Helm chart ai-architecture-charts#137

Copy link
Copy Markdown

@jharmison-redhat jharmison-redhat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have strong feelings on my docs recommendation here, but it would be nice to consider if we're going to start splitting them out like this.

Comment thread README.md Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants