From 27f0a8b83b2d4e1353060ccff1a4de9c21dccd4d Mon Sep 17 00:00:00 2001
From: "pr-test1[bot]" <226697212+pr-test1[bot]@users.noreply.github.com>
Date: Tue, 16 Sep 2025 10:41:44 +0000
Subject: [PATCH 1/3] docs: sync mint.json with latest code

---
 mint.json | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/mint.json b/mint.json
index ccb1843..c52d295 100644
--- a/mint.json
+++ b/mint.json
@@ -38,7 +38,7 @@
     "mode": "auto"
   },
   "navigation": [
-    { 
+    {
       "group": "Getting Started",
       "pages": ["about", "installation", "quick-start"]
     },
@@ -46,6 +46,7 @@
       "group": "Tutorials",
       "pages": [
         "tutorials/deploying-llama-3-to-aws",
+        "tutorials/deploying-llama-3-to-aws-using-query-flag",
         "tutorials/deploying-llama-3-to-gcp",
         "tutorials/deploying-llama-3-to-azure"
       ]

From 8b65e00f4a08b660cf790420f452c2608c6225cb Mon Sep 17 00:00:00 2001
From: "pr-test1[bot]" <226697212+pr-test1[bot]@users.noreply.github.com>
Date: Tue, 16 Sep 2025 10:41:49 +0000
Subject: [PATCH 2/3] docs: sync quick-start.mdx with latest code

---
 quick-start.mdx | 60 +++++++++++++++++++++++++++++--------------------
 1 file changed, 36 insertions(+), 24 deletions(-)

diff --git a/quick-start.mdx b/quick-start.mdx
index 5853ef8..25a2308 100644
--- a/quick-start.mdx
+++ b/quick-start.mdx
@@ -35,14 +35,16 @@ From the dropdown, select `Delete a Model Endpoint` to see the list of models en
 ![Delete Endpoints](../Images/delete-1.png)
 
 
-### Querying Models
+### Querying Models (Interactive)
 
-From the dropdown, select `Query a Model Endpoint` to see the list of models endpoints. Press space to select the endpoints you want to query. Enter the query in the text box and press enter to get the response.
+From the dropdown, select `Query a Model Endpoint` to see the list of model endpoints. Press space to select the endpoints you want to query. Enter the query in the text box and press enter to get the response.
 
 ![Query Endpoints](../Images/query-1.png)
 
 
-### YAML-based Deployment (Recommended)
+---
+
+## YAML-based Deployment (Recommended)
 
 For reproducible deployments, use YAML configuration:
 
@@ -128,6 +130,37 @@ models:
 </Note>
 
 
+---
+
+## YAML-based Querying (New)
+
+Once an endpoint is deployed you can issue batch or ad-hoc queries directly from a YAML file without opening the interactive menu.
+
+1. Create a query YAML (e.g. `llama3-query.yaml`):
+
+```yaml
+deployment: !Deployment
+  destination: aws
+  endpoint_name: llama3-endpoint
+
+query: !Query
+  input: "What are the key differences between Llama 2 and Llama 3?"
+```
+
+2. Execute the query with the new `--query` flag:
+
+```sh
+magemaker --query .magemaker_config/llama3-query.yaml
+```
+
+This flag is now available for all three cloud providers (AWS, GCP, Azure) and mirrors the request/response you would get when using the SDKs directly.
+
+<Warning>
+End-points continue to accrue costs while running—remember to delete them when you’re done!
+</Warning>
+
+
+---
 
 ### Model Fine-tuning
 
@@ -150,26 +183,6 @@ training: !Training
     per_device_train_batch_size: 32
     learning_rate: 2e-5
 ```
-{/* 
-### Recommended Models
-
-<CardGroup>
-  <Card
-    title="google-bert/bert-base-uncased"
-    href="https://huggingface.co/google-bert/bert-base-uncased"
-  >
-    Fill Mask: tries to complete your sentence like Madlibs. Query format: text
-    string with [MASK] somewhere in it.
-  </Card>
-
-  <Card
-    title="sentence-transformers/all-MiniLM-L6-v2"
-    href="https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2"
-  >
-    Feature extraction: turns text into a 384d vector embedding for semantic
-    search / clustering. Query format: "type out a sentence like this one."
-  </Card>
-</CardGroup> */}
 
 <Warning>
   Remember to deactivate unused endpoints to avoid unnecessary charges!
@@ -180,7 +193,6 @@ training: !Training
 
 You can reach us, faizan & jneid, at [support@slashml.com](mailto:support@slashml.com).
 
-
 If anything doesn't make sense or you have suggestions, do point them out at [magemaker.featurebase.app](https://magemaker.featurebase.app/).
 
 We'd love to hear from you! We're excited to learn how we can make this more valuable for the community and welcome any and all feedback and suggestions.

From 9209c9e8ddb6cc8478514a277ecfd4375603a039 Mon Sep 17 00:00:00 2001
From: "pr-test1[bot]" <226697212+pr-test1[bot]@users.noreply.github.com>
Date: Tue, 16 Sep 2025 10:41:50 +0000
Subject: [PATCH 3/3] docs: create
 tutorials/deploying-llama-3-to-aws-using-query-flag.mdx

---
 ...loying-llama-3-to-aws-using-query-flag.mdx | 89 +++++++++++++++++++
 1 file changed, 89 insertions(+)
 create mode 100644 tutorials/deploying-llama-3-to-aws-using-query-flag.mdx

diff --git a/tutorials/deploying-llama-3-to-aws-using-query-flag.mdx b/tutorials/deploying-llama-3-to-aws-using-query-flag.mdx
new file mode 100644
index 0000000..9f500c1
--- /dev/null
+++ b/tutorials/deploying-llama-3-to-aws-using-query-flag.mdx
@@ -0,0 +1,89 @@
+---
+title: Deploying Llama 3 to SageMaker using the Query Flag
+---
+
+## Introduction
+This tutorial guides you through deploying Llama 3 to AWS SageMaker using Magemaker **and** shows how to query it using the new `--query` flag. Ensure you have followed the [installation](installation) steps before proceeding.
+
+## Step 1: Setting Up Magemaker for AWS
+Run the following command to configure Magemaker for AWS SageMaker deployment:
+```sh
+magemaker --cloud aws
+```
+This initializes Magemaker with the necessary configurations for deploying models to SageMaker.
+
+## Step 2: YAML-based Deployment
+For reproducible deployments, use YAML configuration:
+```sh
+magemaker --deploy .magemaker_config/llama3-deploy.yaml
+```
+
+Example deployment YAML:
+```yaml
+deployment: !Deployment
+  destination: aws
+  endpoint_name: llama3-endpoint
+  instance_count: 1
+  instance_type: ml.g5.2xlarge
+  num_gpus: 1
+  quantization: null
+models:
+  - !Model
+    id: meta-llama/Meta-Llama-3-8B-Instruct
+    location: null
+    predict: null
+    source: huggingface
+    task: text-generation
+    version: null
+```
+
+<Note>
+   For gated models like Llama from Meta you must (1) accept the model licence on Hugging Face **and** (2) provide a valid `HUGGING_FACE_HUB_KEY` in your environment for the deployment to succeed.
+</Note>
+
+<Warning>
+You may need to request a quota increase for specific machine types and GPUs in the region where you plan to deploy the model. Check your AWS quotas before proceeding.
+</Warning>
+
+## Step 3: Querying with the `--query` Flag
+After the deployment finishes you can issue requests directly from the CLI without the interactive dropdown.
+
+### 3.1 Create a Query YAML
+Create `llama3-query.yaml`:
+```yaml
+deployment: !Deployment
+  destination: aws 
+  endpoint_name: llama3-endpoint
+
+query: !Query
+  input: "Explain the concept of quantum entanglement in simple terms."
+```
+
+### 3.2 Execute the Query
+```sh
+magemaker --query .magemaker_config/llama3-query.yaml
+```
+
+Sample Response:
+```json
+{
+  "generated_text": "Quantum entanglement is like having two magical coins…",
+  "model": "meta-llama/Meta-Llama-3-8B-Instruct",
+  "total_tokens": 95,
+  "generation_time": 1.3
+}
+```
+
+## Step 4: Programmatic Query (Python)
+You can also call the endpoint via the SageMaker SDK:
+```python
+from sagemaker.huggingface.model import HuggingFacePredictor
+import sagemaker
+
+predictor = HuggingFacePredictor(endpoint_name="llama3-endpoint",
+                                 sagemaker_session=sagemaker.Session())
+print(predictor.predict({"inputs": "What are you?"}))
+```
+
+## Conclusion
+You have successfully deployed and queried Llama 3 on AWS SageMaker using Magemaker’s new `--query` workflow. For any questions or feedback, feel free to contact us at [support@slashml.com](mailto:support@slashml.com).