diff --git a/mint.json b/mint.json index ccb1843..c52d295 100644 --- a/mint.json +++ b/mint.json @@ -38,7 +38,7 @@ "mode": "auto" }, "navigation": [ - { + { "group": "Getting Started", "pages": ["about", "installation", "quick-start"] }, @@ -46,6 +46,7 @@ "group": "Tutorials", "pages": [ "tutorials/deploying-llama-3-to-aws", + "tutorials/deploying-llama-3-to-aws-using-query-flag", "tutorials/deploying-llama-3-to-gcp", "tutorials/deploying-llama-3-to-azure" ] diff --git a/quick-start.mdx b/quick-start.mdx index 5853ef8..25a2308 100644 --- a/quick-start.mdx +++ b/quick-start.mdx @@ -35,14 +35,16 @@ From the dropdown, select `Delete a Model Endpoint` to see the list of models en ![Delete Endpoints](../Images/delete-1.png) -### Querying Models +### Querying Models (Interactive) -From the dropdown, select `Query a Model Endpoint` to see the list of models endpoints. Press space to select the endpoints you want to query. Enter the query in the text box and press enter to get the response. +From the dropdown, select `Query a Model Endpoint` to see the list of model endpoints. Press space to select the endpoints you want to query. Enter the query in the text box and press enter to get the response. ![Query Endpoints](../Images/query-1.png) -### YAML-based Deployment (Recommended) +--- + +## YAML-based Deployment (Recommended) For reproducible deployments, use YAML configuration: @@ -128,6 +130,37 @@ models: +--- + +## YAML-based Querying (New) + +Once an endpoint is deployed you can issue batch or ad-hoc queries directly from a YAML file without opening the interactive menu. + +1. Create a query YAML (e.g. `llama3-query.yaml`): + +```yaml +deployment: !Deployment + destination: aws + endpoint_name: llama3-endpoint + +query: !Query + input: "What are the key differences between Llama 2 and Llama 3?" +``` + +2. Execute the query with the new `--query` flag: + +```sh +magemaker --query .magemaker_config/llama3-query.yaml +``` + +This flag is now available for all three cloud providers (AWS, GCP, Azure) and mirrors the request/response you would get when using the SDKs directly. + + +End-points continue to accrue costs while running—remember to delete them when you’re done! + + + +--- ### Model Fine-tuning @@ -150,26 +183,6 @@ training: !Training per_device_train_batch_size: 32 learning_rate: 2e-5 ``` -{/* -### Recommended Models - - - - Fill Mask: tries to complete your sentence like Madlibs. Query format: text - string with [MASK] somewhere in it. - - - - Feature extraction: turns text into a 384d vector embedding for semantic - search / clustering. Query format: "type out a sentence like this one." - - */} Remember to deactivate unused endpoints to avoid unnecessary charges! @@ -180,7 +193,6 @@ training: !Training You can reach us, faizan & jneid, at [support@slashml.com](mailto:support@slashml.com). - If anything doesn't make sense or you have suggestions, do point them out at [magemaker.featurebase.app](https://magemaker.featurebase.app/). We'd love to hear from you! We're excited to learn how we can make this more valuable for the community and welcome any and all feedback and suggestions. diff --git a/tutorials/deploying-llama-3-to-aws-using-query-flag.mdx b/tutorials/deploying-llama-3-to-aws-using-query-flag.mdx new file mode 100644 index 0000000..9f500c1 --- /dev/null +++ b/tutorials/deploying-llama-3-to-aws-using-query-flag.mdx @@ -0,0 +1,89 @@ +--- +title: Deploying Llama 3 to SageMaker using the Query Flag +--- + +## Introduction +This tutorial guides you through deploying Llama 3 to AWS SageMaker using Magemaker **and** shows how to query it using the new `--query` flag. Ensure you have followed the [installation](installation) steps before proceeding. + +## Step 1: Setting Up Magemaker for AWS +Run the following command to configure Magemaker for AWS SageMaker deployment: +```sh +magemaker --cloud aws +``` +This initializes Magemaker with the necessary configurations for deploying models to SageMaker. + +## Step 2: YAML-based Deployment +For reproducible deployments, use YAML configuration: +```sh +magemaker --deploy .magemaker_config/llama3-deploy.yaml +``` + +Example deployment YAML: +```yaml +deployment: !Deployment + destination: aws + endpoint_name: llama3-endpoint + instance_count: 1 + instance_type: ml.g5.2xlarge + num_gpus: 1 + quantization: null +models: + - !Model + id: meta-llama/Meta-Llama-3-8B-Instruct + location: null + predict: null + source: huggingface + task: text-generation + version: null +``` + + + For gated models like Llama from Meta you must (1) accept the model licence on Hugging Face **and** (2) provide a valid `HUGGING_FACE_HUB_KEY` in your environment for the deployment to succeed. + + + +You may need to request a quota increase for specific machine types and GPUs in the region where you plan to deploy the model. Check your AWS quotas before proceeding. + + +## Step 3: Querying with the `--query` Flag +After the deployment finishes you can issue requests directly from the CLI without the interactive dropdown. + +### 3.1 Create a Query YAML +Create `llama3-query.yaml`: +```yaml +deployment: !Deployment + destination: aws + endpoint_name: llama3-endpoint + +query: !Query + input: "Explain the concept of quantum entanglement in simple terms." +``` + +### 3.2 Execute the Query +```sh +magemaker --query .magemaker_config/llama3-query.yaml +``` + +Sample Response: +```json +{ + "generated_text": "Quantum entanglement is like having two magical coins…", + "model": "meta-llama/Meta-Llama-3-8B-Instruct", + "total_tokens": 95, + "generation_time": 1.3 +} +``` + +## Step 4: Programmatic Query (Python) +You can also call the endpoint via the SageMaker SDK: +```python +from sagemaker.huggingface.model import HuggingFacePredictor +import sagemaker + +predictor = HuggingFacePredictor(endpoint_name="llama3-endpoint", + sagemaker_session=sagemaker.Session()) +print(predictor.predict({"inputs": "What are you?"})) +``` + +## Conclusion +You have successfully deployed and queried Llama 3 on AWS SageMaker using Magemaker’s new `--query` workflow. For any questions or feedback, feel free to contact us at [support@slashml.com](mailto:support@slashml.com).