Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 17 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -100,18 +100,6 @@ This QuickStart allows users to explore the capabilities of RAG by:
| Generation | `meta-llama/Meta-Llama-3-70B-Instruct` | A100 x2/HPU | p4d.24xlarge
| Safety | `meta-llama/Llama-Guard-3-8B` | L4/HPU | g6.2xlarge

- Note: Developers can also use a remote LLM via the command line (see [Remote LLM Deployment](#remote-llm-deployment-example)) or by modifying the `rag-values.yaml` file directly:

```yaml
global:
models:
remote-llm:
id: meta-llama/Llama-3.3-70B-Instruct
url: https://somedomain.com/v1
apiToken: fake-token
enabled: true
```

Note: the 70B model is NOT required for initial testing of this example. The safety/shield model `Llama-Guard-3-8B` is also optional.

### Installation Steps
Expand Down Expand Up @@ -250,7 +238,6 @@ make install NAMESPACE=llama-stack-rag LLM=llama-3-2-3b-instruct SAFETY=llama-gu

# Xeon deployment
make install NAMESPACE=llama-stack-rag LLM=llama-3-2-3b-instruct SAFETY=llama-guard-3-8b DEVICE=xeon

```

**Remote LLM Deployment Example:**
Expand All @@ -259,23 +246,36 @@ To connect to a remote LLM endpoint instead of deploying a local model, use `LLM

```bash
make install NAMESPACE=llama-stack-rag \
LLM=remote-llm \
LLM=remotellm \
LLM_URL=https://my-model-endpoint.example.com/v1 \
LLM_API_TOKEN=my-api-token
LLM_API_TOKEN=my-api-token \
LLM_ID=llm_model_id
```

| Parameter | Description |
|-----------|-------------|
| `LLM=remote-llm` | Indicates a remote model (no local vLLM deployment) |
| `LLM=remotellm` | Indicates a remote model (no local vLLM deployment) |
| `LLM_URL` | The base URL of the remote model endpoint |
| `LLM_API_TOKEN` | Authentication token for the remote endpoint |
| `LLM_ID` | The model of the llm you wish to use |

This skips local model deployment and configures LlamaStack to use the remote inference endpoint directly. No GPU or HF token is required for the LLM.

When prompted, enter your **[Hugging Face Token](https://huggingface.co/settings/tokens)**.

Note: This process may take 10 to 30 minutes depending on the number and size of models to be downloaded.

- Note: Developers can also use a remote LLM via the helm chart (see [Remote LLM Deployment](#remote-llm-deployment-example)) or by modifying the `rag-values.yaml` file directly:

```yaml
global:
models:
remotellm:
id: meta-llama/Llama-3.3-70B-Instruct
url: https://llm-gateway.com/v1
apiToken: api-token
enabled: true
```

7. **Monitor Deployment**

```bash
Expand Down
2 changes: 1 addition & 1 deletion deploy/helm/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -270,7 +270,7 @@ help: ## Show this help message
@echo -e " make install NAMESPACE=my-rag LLM=llama-3-2-3b-instruct LLM_TOLERATION=\"nvidia.com/gpu\""
@echo -e ""
@echo -e " $(BLUE)Option 3:$(NC) Using command-line parameters with remote LLM"
@echo -e " make install NAMESPACE=my-rag LLM=remote-llm LLM_URL=https://<<llm-url>>/v1 LLM_API_TOKEN=<<llm-api-token>>"
@echo -e " make install NAMESPACE=my-rag LLM=remotellm LLM_URL=https://<<llm-url>>/v1 LLM_API_TOKEN=<<llm-api-token>>"

# Dependency checks
.PHONY: check-deps
Expand Down
2 changes: 1 addition & 1 deletion deploy/helm/rag-values.yaml.example
Original file line number Diff line number Diff line change
Expand Up @@ -103,7 +103,7 @@ global:
# To configure LlamaStack with remote llm, replace the id,
# url and apiToken value and set enabled to true

# remote-llm:
# remotellm:
# id: custom-model-id
# url: https://custom-server-url/v1
# apiToken: fake-token
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -99,8 +99,8 @@ def _get_model_type(model):
return meta.get("model_type")

model_list = [
_get_model_id(model) for model in models
if _get_model_type(model) == "llm" and _get_model_id(model) not in shields_set
model.id for model in models
if model.custom_metadata.get("model_type") == "llm" and model.id not in shields_set
]

# Fetch and categorize toolgroups
Expand Down
Loading