Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion mint.json
Original file line number Diff line number Diff line change
Expand Up @@ -38,14 +38,15 @@
"mode": "auto"
},
"navigation": [
{
{
"group": "Getting Started",
"pages": ["about", "installation", "quick-start"]
},
{
"group": "Tutorials",
"pages": [
"tutorials/deploying-llama-3-to-aws",
"tutorials/deploying-llama-3-to-aws-using-query-flag",
"tutorials/deploying-llama-3-to-gcp",
"tutorials/deploying-llama-3-to-azure"
]
Expand Down
60 changes: 36 additions & 24 deletions quick-start.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -35,14 +35,16 @@ From the dropdown, select `Delete a Model Endpoint` to see the list of models en
![Delete Endpoints](../Images/delete-1.png)


### Querying Models
### Querying Models (Interactive)

From the dropdown, select `Query a Model Endpoint` to see the list of models endpoints. Press space to select the endpoints you want to query. Enter the query in the text box and press enter to get the response.
From the dropdown, select `Query a Model Endpoint` to see the list of model endpoints. Press space to select the endpoints you want to query. Enter the query in the text box and press enter to get the response.

![Query Endpoints](../Images/query-1.png)


### YAML-based Deployment (Recommended)
---

## YAML-based Deployment (Recommended)

For reproducible deployments, use YAML configuration:

Expand Down Expand Up @@ -128,6 +130,37 @@ models:
</Note>


---

## YAML-based Querying (New)

Once an endpoint is deployed you can issue batch or ad-hoc queries directly from a YAML file without opening the interactive menu.

1. Create a query YAML (e.g. `llama3-query.yaml`):

```yaml
deployment: !Deployment
destination: aws
endpoint_name: llama3-endpoint

query: !Query
input: "What are the key differences between Llama 2 and Llama 3?"
```

2. Execute the query with the new `--query` flag:

```sh
magemaker --query .magemaker_config/llama3-query.yaml
```

This flag is now available for all three cloud providers (AWS, GCP, Azure) and mirrors the request/response you would get when using the SDKs directly.

<Warning>
End-points continue to accrue costs while running—remember to delete them when you’re done!
</Warning>


---

### Model Fine-tuning

Expand All @@ -150,26 +183,6 @@ training: !Training
per_device_train_batch_size: 32
learning_rate: 2e-5
```
{/*
### Recommended Models

<CardGroup>
<Card
title="google-bert/bert-base-uncased"
href="https://huggingface.co/google-bert/bert-base-uncased"
>
Fill Mask: tries to complete your sentence like Madlibs. Query format: text
string with [MASK] somewhere in it.
</Card>

<Card
title="sentence-transformers/all-MiniLM-L6-v2"
href="https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2"
>
Feature extraction: turns text into a 384d vector embedding for semantic
search / clustering. Query format: "type out a sentence like this one."
</Card>
</CardGroup> */}

<Warning>
Remember to deactivate unused endpoints to avoid unnecessary charges!
Expand All @@ -180,7 +193,6 @@ training: !Training

You can reach us, faizan & jneid, at [support@slashml.com](mailto:support@slashml.com).


If anything doesn't make sense or you have suggestions, do point them out at [magemaker.featurebase.app](https://magemaker.featurebase.app/).

We'd love to hear from you! We're excited to learn how we can make this more valuable for the community and welcome any and all feedback and suggestions.
89 changes: 89 additions & 0 deletions tutorials/deploying-llama-3-to-aws-using-query-flag.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
---
title: Deploying Llama 3 to SageMaker using the Query Flag
---

## Introduction
This tutorial guides you through deploying Llama 3 to AWS SageMaker using Magemaker **and** shows how to query it using the new `--query` flag. Ensure you have followed the [installation](installation) steps before proceeding.

## Step 1: Setting Up Magemaker for AWS
Run the following command to configure Magemaker for AWS SageMaker deployment:
```sh
magemaker --cloud aws
```
This initializes Magemaker with the necessary configurations for deploying models to SageMaker.

## Step 2: YAML-based Deployment
For reproducible deployments, use YAML configuration:
```sh
magemaker --deploy .magemaker_config/llama3-deploy.yaml
```

Example deployment YAML:
```yaml
deployment: !Deployment
destination: aws
endpoint_name: llama3-endpoint
instance_count: 1
instance_type: ml.g5.2xlarge
num_gpus: 1
quantization: null
models:
- !Model
id: meta-llama/Meta-Llama-3-8B-Instruct
location: null
predict: null
source: huggingface
task: text-generation
version: null
```

<Note>
For gated models like Llama from Meta you must (1) accept the model licence on Hugging Face **and** (2) provide a valid `HUGGING_FACE_HUB_KEY` in your environment for the deployment to succeed.
</Note>

<Warning>
You may need to request a quota increase for specific machine types and GPUs in the region where you plan to deploy the model. Check your AWS quotas before proceeding.
</Warning>

## Step 3: Querying with the `--query` Flag
After the deployment finishes you can issue requests directly from the CLI without the interactive dropdown.

### 3.1 Create a Query YAML
Create `llama3-query.yaml`:
```yaml
deployment: !Deployment
destination: aws
endpoint_name: llama3-endpoint

query: !Query
input: "Explain the concept of quantum entanglement in simple terms."
```

### 3.2 Execute the Query
```sh
magemaker --query .magemaker_config/llama3-query.yaml
```

Sample Response:
```json
{
"generated_text": "Quantum entanglement is like having two magical coins…",
"model": "meta-llama/Meta-Llama-3-8B-Instruct",
"total_tokens": 95,
"generation_time": 1.3
}
```

## Step 4: Programmatic Query (Python)
You can also call the endpoint via the SageMaker SDK:
```python
from sagemaker.huggingface.model import HuggingFacePredictor
import sagemaker

predictor = HuggingFacePredictor(endpoint_name="llama3-endpoint",
sagemaker_session=sagemaker.Session())
print(predictor.predict({"inputs": "What are you?"}))
```

## Conclusion
You have successfully deployed and queried Llama 3 on AWS SageMaker using Magemaker’s new `--query` workflow. For any questions or feedback, feel free to contact us at [support@slashml.com](mailto:support@slashml.com).