diff --git a/mint.json b/mint.json
index ccb1843..c52d295 100644
--- a/mint.json
+++ b/mint.json
@@ -38,7 +38,7 @@
"mode": "auto"
},
"navigation": [
- {
+ {
"group": "Getting Started",
"pages": ["about", "installation", "quick-start"]
},
@@ -46,6 +46,7 @@
"group": "Tutorials",
"pages": [
"tutorials/deploying-llama-3-to-aws",
+ "tutorials/deploying-llama-3-to-aws-using-query-flag",
"tutorials/deploying-llama-3-to-gcp",
"tutorials/deploying-llama-3-to-azure"
]
diff --git a/quick-start.mdx b/quick-start.mdx
index 5853ef8..25a2308 100644
--- a/quick-start.mdx
+++ b/quick-start.mdx
@@ -35,14 +35,16 @@ From the dropdown, select `Delete a Model Endpoint` to see the list of models en

-### Querying Models
+### Querying Models (Interactive)
-From the dropdown, select `Query a Model Endpoint` to see the list of models endpoints. Press space to select the endpoints you want to query. Enter the query in the text box and press enter to get the response.
+From the dropdown, select `Query a Model Endpoint` to see the list of model endpoints. Press space to select the endpoints you want to query. Enter the query in the text box and press enter to get the response.

-### YAML-based Deployment (Recommended)
+---
+
+## YAML-based Deployment (Recommended)
For reproducible deployments, use YAML configuration:
@@ -128,6 +130,37 @@ models:
+---
+
+## YAML-based Querying (New)
+
+Once an endpoint is deployed you can issue batch or ad-hoc queries directly from a YAML file without opening the interactive menu.
+
+1. Create a query YAML (e.g. `llama3-query.yaml`):
+
+```yaml
+deployment: !Deployment
+ destination: aws
+ endpoint_name: llama3-endpoint
+
+query: !Query
+ input: "What are the key differences between Llama 2 and Llama 3?"
+```
+
+2. Execute the query with the new `--query` flag:
+
+```sh
+magemaker --query .magemaker_config/llama3-query.yaml
+```
+
+This flag is now available for all three cloud providers (AWS, GCP, Azure) and mirrors the request/response you would get when using the SDKs directly.
+
+
+End-points continue to accrue costs while running—remember to delete them when you’re done!
+
+
+
+---
### Model Fine-tuning
@@ -150,26 +183,6 @@ training: !Training
per_device_train_batch_size: 32
learning_rate: 2e-5
```
-{/*
-### Recommended Models
-
-
-
- Fill Mask: tries to complete your sentence like Madlibs. Query format: text
- string with [MASK] somewhere in it.
-
-
-
- Feature extraction: turns text into a 384d vector embedding for semantic
- search / clustering. Query format: "type out a sentence like this one."
-
- */}
Remember to deactivate unused endpoints to avoid unnecessary charges!
@@ -180,7 +193,6 @@ training: !Training
You can reach us, faizan & jneid, at [support@slashml.com](mailto:support@slashml.com).
-
If anything doesn't make sense or you have suggestions, do point them out at [magemaker.featurebase.app](https://magemaker.featurebase.app/).
We'd love to hear from you! We're excited to learn how we can make this more valuable for the community and welcome any and all feedback and suggestions.
diff --git a/tutorials/deploying-llama-3-to-aws-using-query-flag.mdx b/tutorials/deploying-llama-3-to-aws-using-query-flag.mdx
new file mode 100644
index 0000000..9f500c1
--- /dev/null
+++ b/tutorials/deploying-llama-3-to-aws-using-query-flag.mdx
@@ -0,0 +1,89 @@
+---
+title: Deploying Llama 3 to SageMaker using the Query Flag
+---
+
+## Introduction
+This tutorial guides you through deploying Llama 3 to AWS SageMaker using Magemaker **and** shows how to query it using the new `--query` flag. Ensure you have followed the [installation](installation) steps before proceeding.
+
+## Step 1: Setting Up Magemaker for AWS
+Run the following command to configure Magemaker for AWS SageMaker deployment:
+```sh
+magemaker --cloud aws
+```
+This initializes Magemaker with the necessary configurations for deploying models to SageMaker.
+
+## Step 2: YAML-based Deployment
+For reproducible deployments, use YAML configuration:
+```sh
+magemaker --deploy .magemaker_config/llama3-deploy.yaml
+```
+
+Example deployment YAML:
+```yaml
+deployment: !Deployment
+ destination: aws
+ endpoint_name: llama3-endpoint
+ instance_count: 1
+ instance_type: ml.g5.2xlarge
+ num_gpus: 1
+ quantization: null
+models:
+ - !Model
+ id: meta-llama/Meta-Llama-3-8B-Instruct
+ location: null
+ predict: null
+ source: huggingface
+ task: text-generation
+ version: null
+```
+
+
+ For gated models like Llama from Meta you must (1) accept the model licence on Hugging Face **and** (2) provide a valid `HUGGING_FACE_HUB_KEY` in your environment for the deployment to succeed.
+
+
+
+You may need to request a quota increase for specific machine types and GPUs in the region where you plan to deploy the model. Check your AWS quotas before proceeding.
+
+
+## Step 3: Querying with the `--query` Flag
+After the deployment finishes you can issue requests directly from the CLI without the interactive dropdown.
+
+### 3.1 Create a Query YAML
+Create `llama3-query.yaml`:
+```yaml
+deployment: !Deployment
+ destination: aws
+ endpoint_name: llama3-endpoint
+
+query: !Query
+ input: "Explain the concept of quantum entanglement in simple terms."
+```
+
+### 3.2 Execute the Query
+```sh
+magemaker --query .magemaker_config/llama3-query.yaml
+```
+
+Sample Response:
+```json
+{
+ "generated_text": "Quantum entanglement is like having two magical coins…",
+ "model": "meta-llama/Meta-Llama-3-8B-Instruct",
+ "total_tokens": 95,
+ "generation_time": 1.3
+}
+```
+
+## Step 4: Programmatic Query (Python)
+You can also call the endpoint via the SageMaker SDK:
+```python
+from sagemaker.huggingface.model import HuggingFacePredictor
+import sagemaker
+
+predictor = HuggingFacePredictor(endpoint_name="llama3-endpoint",
+ sagemaker_session=sagemaker.Session())
+print(predictor.predict({"inputs": "What are you?"}))
+```
+
+## Conclusion
+You have successfully deployed and queried Llama 3 on AWS SageMaker using Magemaker’s new `--query` workflow. For any questions or feedback, feel free to contact us at [support@slashml.com](mailto:support@slashml.com).