Model Profiles for NVIDIA RAG Blueprint

Use the following documentation to learn about model profiles available for NVIDIA RAG Blueprint.

This section provides the recommended model profiles for different hardware configurations. You should use these profiles for all deployment methods (Docker Compose, Helm Chart, RAG python library, and NIM Operator).

Profile Selection Guidelines

TensorRT-LLM profiles (tensorrt_llm-*) are recommended for best performance
For multi-GPU setups, ensure proper GPU allocation by setting LLM_MS_GPU_ID environment variable in docker setup.
Always verify available profiles using the list-model-profiles command before deployment
By default, NIM uses automatic profile detection. However, you can manually specify a profile for optimal performance using the instructions below

List Available Profiles

To see all available profiles for your specific hardware configuration, run the following code.

USERID=$(id -u) docker run --rm --gpus all \
  -v ~/.cache/model-cache:/opt/nim/.cache \
  nvcr.io/nim/nvidia/llama-3.3-nemotron-super-49b-v1.5:1.13.1 \
  list-model-profiles

Hardware-Specific Profiles

The following profiles are optimized for different common GPU configurations:

1xH100 NVL

tensorrt_llm-h100_nvl-fp8-tp1-pp1-throughput-2321:10de-d347471b749e4e6b6e5956bb0f600b6646461c214cadadf6614baf305054a743-1

1xH100 SXM

tensorrt_llm-h100-fp8-tp1-pp1-throughput-2330:10de-a5381c1be0b8ee66ad41e7dc7b4e6d2cffaa7a4e37ca05f57898817560b0bd2b-1

2xA100 SXM

vllm-bf16-tp2-pp1-32c3b968468aefcfb3ea1db5a16e3dc9d64395f02ef68a06175e8bbdb0038601

1xRTX PRO 6000

tensorrt_llm-rtx6000_blackwell_sv-fp8-tp1-pp1-throughput-2bb5:10de-d21d6986d29d8abf555f35c9a4c8146c4b10595d9e57e6efabd4a026efcc0c4a-1

2xB200

tensorrt_llm-b200-fp8-tp2-pp1-throughput-2901:10de-d2ff2bbf26fdabe28afaf754ca8e5615ed337e19d873da15627c209849f51072-2

Configuring Model Profiles

Note: NIM automatically detects and selects the optimal profile for your hardware. Only configure a specific profile if you experience issues with the default deployment, such as performance problems or out-of-memory errors.

Docker Compose Deployment

To set a specific model profile in Docker Compose, add the NIM_MODEL_PROFILE environment variable to the nim-llm service in deploy/compose/nims.yaml:

  nim-llm:
    container_name: nim-llm-ms
    image: nvcr.io/nim/nvidia/llama-3.3-nemotron-super-49b-v1.5:1.14.0
    # ... other configuration ...
    environment:
      NGC_API_KEY: ${NGC_API_KEY}
      NIM_MODEL_PROFILE: ${NIM_MODEL_PROFILE-""}  # Add this line

Then set the profile in your environment or .env file before deploying:

export NIM_MODEL_PROFILE="tensorrt_llm-h100-fp8-tp1-pp1-throughput-2330:10de-a5381c1be0b8ee66ad41e7dc7b4e6d2cffaa7a4e37ca05f57898817560b0bd2b-1"
docker compose -f deploy/compose/nims.yaml up -d

Helm Deployment

To set a specific model profile in Helm, add the NIM_MODEL_PROFILE environment variable to the nim-llm section in deploy/helm/nvidia-blueprint-rag/values.yaml:

nim-llm:
  enabled: true
  service:
    name: "nim-llm"
  image:
    repository: nvcr.io/nim/nvidia/llama-3.3-nemotron-super-49b-v1.5
    pullPolicy: IfNotPresent
    tag: "1.14.0"
  resources:
    limits:
      nvidia.com/gpu: 1
    requests:
      nvidia.com/gpu: 1

  env:  # Add this section
    - name: NIM_MODEL_PROFILE
      value: "tensorrt_llm-h100-fp8-tp1-pp1-throughput-2330:10de-a5381c1be0b8ee66ad41e7dc7b4e6d2cffaa7a4e37ca05f57898817560b0bd2b-1"
  model:
    ngcAPIKey: ""
    name: "nvidia/llama-3.3-nemotron-super-49b-v1.5"
    hfTokenSecret: ""

After modifying the values.yaml file, deploy or update the Helm chart:

helm upgrade --install rag -n rag https://helm.ngc.nvidia.com/nvstaging/blueprint/charts/nvidia-blueprint-rag-v2.4.0-rc1.tgz \
--username '$oauthtoken' \
--password "${NGC_API_KEY}" \
--set imagePullSecret.password=$NGC_API_KEY \
--set ngcApiSecret.password=$NGC_API_KEY \
-f nvidia-blueprint-rag/values.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model Profiles for NVIDIA RAG Blueprint

Profile Selection Guidelines

List Available Profiles

Hardware-Specific Profiles

1xH100 NVL

1xH100 SXM

2xA100 SXM

1xRTX PRO 6000

2xB200

Configuring Model Profiles

Docker Compose Deployment

Helm Deployment

Related Topics

FilesExpand file tree

model-profiles.md

Latest commit

History

model-profiles.md

File metadata and controls

Model Profiles for NVIDIA RAG Blueprint

Profile Selection Guidelines

List Available Profiles

Hardware-Specific Profiles

1xH100 NVL

1xH100 SXM

2xA100 SXM

1xRTX PRO 6000

2xB200

Configuring Model Profiles

Docker Compose Deployment

Helm Deployment

Related Topics