From 7fe74f029b6c9a7d775a6416b1dc03773637a3dc Mon Sep 17 00:00:00 2001 From: "docsalot-app[bot]" <207601912+docsalot-app[bot]@users.noreply.github.com> Date: Sat, 27 Sep 2025 17:09:18 +0000 Subject: [PATCH 01/18] docs: sync CONTRIBUTING.md with latest code --- CONTRIBUTING.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index a70cce7..5a70e59 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -65,4 +65,4 @@ By contributing, you agree that your contributions will be licensed under the Ap ## Questions? -Feel free to contact us at [support@slashml.com](mailto:support@slashml.com) if you have any questions about contributing! \ No newline at end of file +Feel free to contact us at [support@slashml.com](mailto:support@slashml.com) if you have any questions about contributing! From 7af6dc922f8eb4ebbfdcd6f216980d9235dcafb4 Mon Sep 17 00:00:00 2001 From: "docsalot-app[bot]" <207601912+docsalot-app[bot]@users.noreply.github.com> Date: Sat, 27 Sep 2025 17:09:20 +0000 Subject: [PATCH 02/18] docs: sync about.mdx with latest code --- about.mdx | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/about.mdx b/about.mdx index d9c04a4..28a8109 100644 --- a/about.mdx +++ b/about.mdx @@ -6,7 +6,11 @@ description: Deploy open source AI models to AWS, GCP, and Azure in minutes ## About Magemaker -Magemaker is a Python tool that simplifies the process of deploying open source AI models to your preferred cloud provider. Instead of spending hours digging through documentation, Magemaker lets you deploy Hugging Face models directly to AWS SageMaker, Google Cloud Vertex AI, or Azure Machine Learning. +Magemaker is a Python tool that simplifies the process of deploying open-source AI models to your preferred cloud provider. Instead of spending hours digging through documentation, Magemaker lets you deploy Hugging Face models directly to AWS SageMaker, Google Cloud Vertex AI, or Azure Machine Learning. + + + New in the latest release: Magemaker now ships with an optional FastAPI **OpenAI-compatible proxy server** (`server.py`). You can spin up the proxy to expose any deployed endpoint behind the familiar `/v1/chat/completions` interface—perfect for drop-in replacement of OpenAI keys in existing applications. See the dedicated tutorial for details. + ## What we're working on next @@ -22,10 +26,9 @@ Do submit your feature requests at https://magemaker.featurebase.app/ - Querying within Magemaker currently only works with text-based models - Deleting a model is not instant, it may show up briefly after deletion - Deploying the same model within the same minute will break -- Hugging-face models on Azure have different Ids than their Hugging-face counterparts. Follow the steps specified in the quick-start guide to find the relevant models -- For Azure deploying models other than Hugging-face is not supported yet. -- Python3.13 is not supported because of an open-issue by Azure. https://github.com/Azure/azure-sdk-for-python/issues/37600 - +- Hugging Face models on Azure have different IDs than their Hugging Face counterparts. Follow the steps specified in the quick-start guide to find the relevant models. +- For Azure, deploying models other than Hugging Face is not supported yet. +- Python 3.13 is not supported because of an open issue by Azure. https://github.com/Azure/azure-sdk-for-python/issues/37600 If there is anything we missed, do point them out at https://magemaker.featurebase.app/ From 83992060209a58875199ffe1a52e26464ee60144 Mon Sep 17 00:00:00 2001 From: "docsalot-app[bot]" <207601912+docsalot-app[bot]@users.noreply.github.com> Date: Sat, 27 Sep 2025 17:09:22 +0000 Subject: [PATCH 03/18] docs: sync concepts/contributing.mdx with latest code --- concepts/contributing.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/concepts/contributing.mdx b/concepts/contributing.mdx index 8c61908..6779ff4 100644 --- a/concepts/contributing.mdx +++ b/concepts/contributing.mdx @@ -165,4 +165,4 @@ We are committed to providing a welcoming and inclusive experience for everyone. ## License -By contributing to Magemaker, you agree that your contributions will be licensed under the Apache 2.0 License. \ No newline at end of file +By contributing to Magemaker, you agree that your contributions will be licensed under the Apache 2.0 License. From 694126a602a92731f7f19ce7ecb6424e2a96f073 Mon Sep 17 00:00:00 2001 From: "docsalot-app[bot]" <207601912+docsalot-app[bot]@users.noreply.github.com> Date: Sat, 27 Sep 2025 17:09:24 +0000 Subject: [PATCH 04/18] docs: sync concepts/deployment.mdx with latest code --- concepts/deployment.mdx | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/concepts/deployment.mdx b/concepts/deployment.mdx index 66ca7a9..be81156 100644 --- a/concepts/deployment.mdx +++ b/concepts/deployment.mdx @@ -62,7 +62,7 @@ deployment: !Deployment destination: gcp endpoint_name: opt-125m-gcp instance_count: 1 - machine_type: n1-standard-4 + instance_type: n1-standard-4 # machine type accelerator_type: NVIDIA_TESLA_T4 accelerator_count: 1 @@ -83,7 +83,7 @@ deployment: !Deployment models: - !Model - id: facebook-opt-125m + id: facebook-opt-125m # Azure uses different model IDs source: huggingface ``` @@ -112,7 +112,8 @@ deployment: !Deployment endpoint_name: test-llama3-8b instance_count: 1 instance_type: ml.g5.12xlarge - num_gpus: 4 + num_gpus: 4 # Optional – override default GPU count + quantization: bitsandbytes # Optional – 4/8-bit quantisation models: - !Model @@ -202,10 +203,9 @@ Choose your instance type based on your model's requirements: 4. Set up monitoring and alerting for your endpoints -Make sure you setup budget monitory and alerts to avoid unexpected charges. +Make sure you set up budget monitoring and alerts to avoid unexpected charges. - ## Troubleshooting Deployments Common issues and their solutions: @@ -225,4 +225,4 @@ Common issues and their solutions: - Verify model ID and version - Check instance memory requirements - Validate Hugging Face token if required - - Endpoing deployed but deployment failed. Check the logs, and do report this to us if you see this issue. + - Endpoint deployed but deployment failed. Check the logs and report the issue if it persists. From f0ad5414aff519e48c25aa441e605329c4ff6692 Mon Sep 17 00:00:00 2001 From: "docsalot-app[bot]" <207601912+docsalot-app[bot]@users.noreply.github.com> Date: Sat, 27 Sep 2025 17:09:25 +0000 Subject: [PATCH 05/18] docs: sync concepts/fine-tuning.mdx with latest code --- concepts/fine-tuning.mdx | 107 +++++++++++++++++++++------------------ 1 file changed, 59 insertions(+), 48 deletions(-) diff --git a/concepts/fine-tuning.mdx b/concepts/fine-tuning.mdx index 88835aa..17278eb 100644 --- a/concepts/fine-tuning.mdx +++ b/concepts/fine-tuning.mdx @@ -5,13 +5,16 @@ description: Guide to fine-tuning models with Magemaker ## Fine-tuning Overview -Fine-tuning allows you to adapt pre-trained models to your specific use case. Magemaker simplifies this process through YAML configuration. +Fine-tuning allows you to adapt pre-trained models to your specific use-case. +Magemaker currently supports **AWS SageMaker** fine-tuning (support for GCP & Azure is on the roadmap). +The process is fully driven by a YAML configuration file and a single CLI command. -### Basic Command +> **Basic Command** +> ```sh +> magemaker --train .magemaker_config/train-config.yaml +> ``` -```sh -magemaker --train .magemaker_config/train-config.yaml -``` +--- ## Configuration @@ -19,17 +22,24 @@ magemaker --train .magemaker_config/train-config.yaml ```yaml training: !Training - destination: aws - instance_type: ml.p3.2xlarge + destination: aws # Only “aws” is supported today + instance_type: ml.p3.2xlarge # GPU instance for training instance_count: 1 training_input_path: s3://your-bucket/training-data.csv models: - !Model - id: your-model-id + id: your-model-id # e.g. google-bert/bert-base-uncased source: huggingface ``` +• **destination** – cloud provider (must be `aws` for now) +• **training_input_path** – S3 URI pointing to your training dataset +• **instance_type / count** – SageMaker training cluster specs + +### Automatic Hyper-parameters (zero-config) +If **`hyperparameters`** are omitted, Magemaker will generate sensible defaults based on the model **task** (e.g. text-classification, text-generation) using the logic in `magemaker/sagemaker/fine_tune_model.py`. This is helpful for quick experiments. + ### Advanced Configuration ```yaml @@ -49,70 +59,65 @@ training: !Training save_steps: 1000 ``` +You can supply any Hugging Face training argument accepted by the Transformers library. Values can be: + +• **Scalars** (as above) +• **Ranges / Lists** for SageMaker Hyperparameter Tuning Jobs *(coming soon)* + +--- + ## Data Preparation ### Supported Formats - - - Simple tabular data - - Easy to prepare - - Good for classification tasks + + - Column-based datasets
+ - Good for classic NLP tasks
- - - Flexible data format - - Good for complex inputs - - Supports nested structures + - One JSON object per line
+ - Flexible structure for complex inputs
-### Data Upload +### Uploading Data - Format your data according to model requirements + Clean & format according to the model task (e.g. columns `text,label` for classification). - Use AWS CLI or console to upload data + ```bash + aws s3 cp local_file.csv s3://your-bucket/training-data.csv + ``` - - Specify S3 path in training configuration + + Point `training_input_path` to the S3 URI you just uploaded. -## Instance Selection +--- -### Training Instance Types +## Instance Selection -Choose based on: -- Dataset size -- Model size -- Training time requirements -- Cost constraints +Training can be expensive—choose wisely based on dataset & model size: -Popular choices: -- ml.p3.2xlarge (1 GPU) -- ml.p3.8xlarge (4 GPUs) -- ml.p3.16xlarge (8 GPUs) +• **ml.p3.2xlarge** – 1× V100 GPU (entry level) +• **ml.p3.8xlarge** – 4× V100 GPUs +• **ml.p3.16xlarge** – 8× V100 GPUs -## Hyperparameter Tuning +If you need A100 GPUs, use the p4 family (ensure you have service-quota approval). -### Basic Parameters +--- -```yaml -hyperparameters: !Hyperparameters - epochs: 3 - learning_rate: 2e-5 - batch_size: 32 -``` +## Hyperparameter Tuning (coming soon) -### Advanced Tuning +Magemaker will expose SageMaker HPO jobs. YAML will accept parameter ranges: ```yaml hyperparameters: !Hyperparameters - epochs: 3 - learning_rate: + learning_rate: min: 1e-5 max: 1e-4 scaling: log @@ -120,11 +125,17 @@ hyperparameters: !Hyperparameters values: [16, 32, 64] ``` +Stay tuned for this feature in an upcoming release. + +--- + ## Monitoring Training -### CloudWatch Metrics +Once the job starts, you can: + +1. Open the SageMaker console – Training Jobs –> **Logs** +2. View real-time metrics in **CloudWatch**: loss, lr, GPU utilisation -Available metrics: -- Loss -- Learning rate -- GPU utilization \ No newline at end of file + + Training jobs run until completion even if the terminal session closes. Make sure to stop any jobs you no longer need to avoid unnecessary charges. + From 4ed382347b8a24060c2b22bb992b0ef5f6b0f7f6 Mon Sep 17 00:00:00 2001 From: "docsalot-app[bot]" <207601912+docsalot-app[bot]@users.noreply.github.com> Date: Sat, 27 Sep 2025 17:09:27 +0000 Subject: [PATCH 06/18] docs: sync concepts/models.mdx with latest code --- concepts/models.mdx | 73 ++++++++++++++++++++++++++++----------------- 1 file changed, 46 insertions(+), 27 deletions(-) diff --git a/concepts/models.mdx b/concepts/models.mdx index 0161380..86cf586 100644 --- a/concepts/models.mdx +++ b/concepts/models.mdx @@ -6,7 +6,12 @@ description: Guide to supported models and their requirements ## Supported Models -Currently, Magemaker supports deployment of Hugging Face models only. Support for cloud provider marketplace models is coming soon! +Magemaker currently supports two model sources: + +1. Hugging Face models (all clouds) +2. Amazon SageMaker JumpStart models (AWS only) + +Support for cloud-provider marketplace models on GCP and Azure is coming soon! ### Hugging Face Models @@ -26,15 +31,30 @@ Currently, Magemaker supports deployment of Hugging Face models only. Support fo -### Future Support - -We plan to add support for the following model sources: +### Amazon SageMaker JumpStart Models (AWS) - - Models from AWS Marketplace and SageMaker built-in algorithms + + - meta-textgeneration-llama-3-8b-instruct + - flan-t5-xxl + - gpt-neo-1_3b + + - bert-base-uncased + - distilbert-base-uncased-finetuned-sst-2 + + + + +When deploying a JumpStart model, set `source: sagemaker` and use the exact JumpStart model ID (e.g. `meta-textgeneration-llama-3-8b-instruct`). + + +### Future Support + +We plan to add support for the following additional model sources: + + Models from Vertex AI Model Garden and Foundation Models @@ -43,6 +63,7 @@ We plan to add support for the following model sources: Models from Azure ML Model Catalog and Azure OpenAI + ## Model Requirements ### Instance Type Recommendations by Cloud Provider @@ -94,9 +115,7 @@ We plan to add support for the following model sources: ## Example Deployments -### Example Hugging Face Model Deployment - -Deploy the same Hugging Face model to different cloud providers: +### Example Hugging Face Model Deployment (All Clouds) AWS SageMaker: ```yaml @@ -129,23 +148,23 @@ deployment: !Deployment ``` - The model ids for Azure are different from AWS and GCP. Make sure to use the one provided by Azure in the Azure Model Catalog. - - To find the relevnt model id, follow the following steps - - - Find the workpsace in the Azure portal and click on the studio url provided. Click on the `Model Catalog` on the left side bar - ![Azure ML Creation](../Images/workspace-studio.png) - + The model IDs for Azure are different from AWS and GCP. Make sure to use the one provided by Azure in the Azure Model Catalog. See the [Quick Start](quick-start) for steps to locate the correct ID. + - - Select Hugging-Face from the collections list. The id of the model card is the id you need to use in the yaml file - ![Azure ML Creation](../Images/hugging-face.png) - +### Example SageMaker JumpStart Deployment (AWS) - - +```yaml +models: +- !Model + id: meta-textgeneration-llama-3-8b-instruct # JumpStart model ID + source: sagemaker +deployment: !Deployment + destination: aws + endpoint_name: llama3-jumpstart + instance_type: ml.g5.12xlarge + num_gpus: 4 +``` ## Model Configuration @@ -155,8 +174,8 @@ deployment: !Deployment models: - !Model id: your-model-id - source: huggingface|sagemaker # we don't support vertex and azure specific models yet - revision: latest # Optional: specify model version + source: huggingface | sagemaker # choose the appropriate source + revision: latest # Optional: specify model version ``` ### Advanced Parameters @@ -181,9 +200,9 @@ models: - Consider data residency requirements - Test latency from different regions -3. **Cost Management** +2. **Cost Management** - Compare instance pricing - - Make sure you set up the relevant alerting + - Set up cost alerts and budgets ## Troubleshooting From 99c78e26535d34d6c00f020482d919be53ee1cdd Mon Sep 17 00:00:00 2001 From: "docsalot-app[bot]" <207601912+docsalot-app[bot]@users.noreply.github.com> Date: Sat, 27 Sep 2025 17:09:29 +0000 Subject: [PATCH 07/18] docs: sync configuration/AWS.mdx with latest code --- configuration/AWS.mdx | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/configuration/AWS.mdx b/configuration/AWS.mdx index cdc4b9f..7a0f8ef 100644 --- a/configuration/AWS.mdx +++ b/configuration/AWS.mdx @@ -4,13 +4,12 @@ title: AWS ### AWS CLI -To install Azure SDK on MacOS, you need to have the latest OS and you need to use Rosetta terminal. Also, make sure you have the latest version of Xcode tools installed. +For Apple Silicon (M1/M2) Macs the official AWS CLI v2 universal installer usually works out-of-the-box. If you run into issues, try installing the CLI through Homebrew or launch your terminal with Rosetta. -Follow this guide to install the latest AWS CLI +Follow this guide to install the latest AWS CLI: https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html - Once you have the CLI installed and working, follow these steps @@ -47,11 +46,12 @@ You should see the following screen after clicking IAM. ![Enter image alt description](../Images/E7x_Image_5.png) 2. Select "Attach policies directly". Under permission policies, search for and tick the boxes for: - - `AmazonSagemakerFullAccess` + - `AmazonSageMakerFullAccess` + - `AmazonS3FullAccess` - `IAMFullAccess` - `ServiceQuotasFullAccess` -Then click Next. +Then click **Next**. ![Enter image alt description](../Images/01X_Image_6.png) @@ -59,19 +59,19 @@ The final list should look like the following: ![Enter image alt description](../Images/Dfp_Image_7.png) -Click "Create user" on the following screen. +Click **Create user** on the following screen. 1. Click the name of the user you've just created (or one that already exists) -2. Go to "Security Credentials" tab -3. Scroll down to "Access Keys" section -4. Click "Create access key" -5. Select Command Line Interface then click next +2. Go to **Security Credentials** tab +3. Scroll down to **Access Keys** section +4. Click **Create access key** +5. Select **Command Line Interface** then click **Next** ![Enter image alt description](../Images/BPP_Image_8.png) -Enter a description (this is optional, can leave blank). Then click next. +Enter a description (optional) and click **Next**. ![Enter image alt description](../Images/gMD_Image_9.png) From 55aa23543d121cefc52cf53142d35478e1d58fff Mon Sep 17 00:00:00 2001 From: "docsalot-app[bot]" <207601912+docsalot-app[bot]@users.noreply.github.com> Date: Sat, 27 Sep 2025 17:09:31 +0000 Subject: [PATCH 08/18] docs: sync configuration/Azure.mdx with latest code --- configuration/Azure.mdx | 140 +++++++++++++++++++++++++--------------- 1 file changed, 87 insertions(+), 53 deletions(-) diff --git a/configuration/Azure.mdx b/configuration/Azure.mdx index 3c1104a..2437c81 100644 --- a/configuration/Azure.mdx +++ b/configuration/Azure.mdx @@ -1,30 +1,37 @@ --- title: Azure -description: Configure Magemaker for your cloud providers +description: Configure Magemaker for Azure Machine Learning --- -### Azure CLI + + • Python 3.13 is NOT supported by the Azure SDK at the moment (see + open issue).
+ • Please use Python 3.11 or 3.12 with Magemaker on Azure. +
-To install Azure SDK on MacOS, you need to have the latest OS and you need to use Rosetta terminal. Also, make sure you have the latest version of Xcode tools installed. +## Azure CLI + + Apple-Silicon (M-series) Macs must run the terminal under Rosetta when + interacting with the Azure CLI due to native dependency issues. Confirm the + architecture with `arch`; the output should be i386 in a Rosetta + shell. + -To install the latest Azure CLI, run: - -```bash -brew update && brew install azure-cli -``` - -Alternatively, follow this official guide from Azure -- [https://learn.microsoft.com/en-us/cli/azure/install-azure-cli-macos](https://learn.microsoft.com/en-us/cli/azure/install-azure-cli-macos) - -Once you have installed azure CLI, follow these steps - +1. Install (or upgrade) the Azure CLI: + ```bash + brew update && brew install azure-cli + ``` + Alternatively, follow the official guide: -### Azure Account -Step 1: Create azure cloud account +2. Install (or upgrade) the Azure ML CLI extension – **required** for Magemaker: + ```bash + az extension add --name ml -y # or: az extension update --name ml + ``` -- [https://azure.microsoft.com/en-ca](null) +--- +## Azure Account & Workspace Setup @@ -32,53 +39,80 @@ Step 1: Create azure cloud account az login ``` + ```bash az account set --subscription ``` - - From the terminal - ```bash - az group create --name --location - ``` + + From the terminal + ```bash + az group create --name --location + ``` + From the Azure Portal + ![Create RG](../Images/XzN_Image_12.png) + - From the Azure Portal - ![Enter image alt description](../Images/XzN_Image_12.png) + + From the terminal + ```bash + az ml workspace create -n -g + ``` + From the Azure Portal + 1. Search for Azure Machine Learning in the search bar. + 2. Click Create → New workspace. + ![Workspace](../Images/workspace_creation.png) + + + ```bash + az provider register --namespace Microsoft.MachineLearningServices + az provider register --namespace Microsoft.ContainerRegistry + az provider register --namespace Microsoft.KeyVault + az provider register --namespace Microsoft.Storage + az provider register --namespace Microsoft.Insights + az provider register --namespace Microsoft.ContainerService + az provider register --namespace Microsoft.PolicyInsights + az provider register --namespace Microsoft.Cdn + ``` + Registration can take up to 10 minutes. Verify with: + az provider show -n Microsoft.MachineLearningServices - - From the terminal - ```bash - az ml workspace create -n -g - ``` + -From the Azure portal -1. Search for `Azure Machine Learning` in the search bar. - ![Azure ML Creation](../Images/AzureML.png) +--- -2. Inside the `Azure Machine Learning` portal. Click on Create, and select `New Workspce` from the drop down - ![workspace creation](../Images/workspace_creation.png) +## Configure Magemaker -
- - ```bash - # Register all required providers: THIS STEP IS IMPORTANT - az provider register --namespace Microsoft.MachineLearningServices - az provider register --namespace Microsoft.ContainerRegistry - az provider register --namespace Microsoft.KeyVault - az provider register --namespace Microsoft.Storage - az provider register --namespace Microsoft.Insights - az provider register --namespace Microsoft.ContainerService - az provider register --namespace Microsoft.PolicyInsights - az provider register --namespace Microsoft.Cdn - ``` +Run the following command once the Azure CLI is configured: +```bash +magemaker --cloud azure +``` +The command will: +1. Validate your Azure credentials. +2. Prompt for any missing values. +3. Generate a .env file with the required environment variables. - - Registration can take up to 10 minutes. Check status with: ```bash az - provider show -n Microsoft.MachineLearningServices ``` - +### Required Environment Variables +The .env file must contain the following keys (generated automatically, but you can edit them later): +```bash +AZURE_SUBSCRIPTION_ID="" +AZURE_RESOURCE_GROUP="" +AZURE_WORKSPACE_NAME="" +AZURE_REGION="" # e.g. eastus, westeurope - - +# Optional – required only for gated Hugging Face models like Llama 3 +HUGGING_FACE_HUB_KEY="" +``` +Never commit your .env file to version control! + +--- + +## Next Steps + +• Deploy a model: see the [Quick Start](/quick-start) or the + [Llama 3 on Azure tutorial](/tutorials/deploying-llama-3-to-azure).
+• Need to expose the endpoint through an OpenAI-compatible API? Check out the + upcoming OpenAI Proxy tutorial. From a50eff080678ad90a9b4879b08878d6d2df1ee65 Mon Sep 17 00:00:00 2001 From: "docsalot-app[bot]" <207601912+docsalot-app[bot]@users.noreply.github.com> Date: Sat, 27 Sep 2025 17:09:32 +0000 Subject: [PATCH 09/18] docs: sync configuration/Environment.mdx with latest code --- configuration/Environment.mdx | 72 +++++++++++++++++++++++++++-------- 1 file changed, 57 insertions(+), 15 deletions(-) diff --git a/configuration/Environment.mdx b/configuration/Environment.mdx index 0781ec3..9ae9eda 100644 --- a/configuration/Environment.mdx +++ b/configuration/Environment.mdx @@ -2,29 +2,71 @@ title: Environment Variables --- -### Required Config File -A `.env` file is automatically created when you run `magemaker --cloud `. This file contains the necessary environment variables for your cloud provider(s). +### Required `.env` File +A `.env` file is automatically created the first time you run -By default, Magemaker will look for a `.env` file in your project root with the following variables based on which cloud provider(s) you plan to use: +```bash +magemaker --cloud +``` + +This file stores the credentials and configuration Magemaker needs to talk to +AWS, GCP, and/or Azure on your behalf. + +### Minimum Variables by Cloud Provider ```bash +# ----------------------- # AWS Configuration -AWS_ACCESS_KEY_ID="your-access-key" # Required for AWS -AWS_SECRET_ACCESS_KEY="your-secret-key" # Required for AWS -SAGEMAKER_ROLE="arn:aws:iam::..." # Required for AWS +# ----------------------- +AWS_ACCESS_KEY_ID="your-access-key" # Required for AWS +AWS_SECRET_ACCESS_KEY="your-secret-key" # Required for AWS +SAGEMAKER_ROLE="arn:aws:iam::..." # Required for AWS +AWS_REGION_NAME="us-east-1" # Optional – override default region +# ----------------------- # GCP Configuration -PROJECT_ID="your-project-id" # Required for GCP -GCLOUD_REGION="us-central1" # Required for GCP +# ----------------------- +PROJECT_ID="your-project-id" # Required for GCP +GCLOUD_REGION="us-central1" # Required for GCP +# ----------------------- # Azure Configuration -AZURE_SUBSCRIPTION_ID="your-sub-id" # Required for Azure -AZURE_RESOURCE_GROUP="ml-resources" # Required for Azure -AZURE_WORKSPACE_NAME="ml-workspace" # Required for Azure -AZURE_REGION="eastus" # Required for Azure +# ----------------------- +AZURE_SUBSCRIPTION_ID="your-sub-id" # Required for Azure +AZURE_RESOURCE_GROUP="ml-resources" # Required for Azure +AZURE_WORKSPACE_NAME="ml-workspace" # Required for Azure +AZURE_REGION="eastus" # Required for Azure + +# ----------------------- +# Optional / Advanced Settings +# ----------------------- +HUGGING_FACE_HUB_KEY="your-hf-token" # Needed for gated HF models (e.g. Llama-3) +CONFIG_DIR=".magemaker_config" # Where Magemaker stores YAML configs +``` + + +Never commit your `.env` file to version control! Use Git-ignore or a secrets +manager. + -# Optional configurations -HUGGING_FACE_HUB_KEY="your-hf-token" # Required for gated HF models like llama +### When Do I Need `AWS_REGION_NAME`? +The new FastAPI proxy server (`server.py`) sets `AWS_REGION_NAME` automatically +based on your current session. If you run the proxy on a separate machine or +inside a container where the automatic lookup fails, define the region +explicitly: + +```bash +export AWS_REGION_NAME="us-west-2" +``` + +### Custom Config Directory (`CONFIG_DIR`) +By default Magemaker reads and writes YAML deployment files to +`.magemaker_config/` in your project root. Override this location if you want +to keep configs elsewhere (e.g. inside `infra/` for IaC): + +```bash +export CONFIG_DIR="infra/magemaker-config" ``` -Never commit your .env file to version control! +All CLI commands, the FastAPI proxy, and helper scripts will automatically pick +up the new path. From d8c15bae6a2c70cbf61349d25c71b5b42d09b14a Mon Sep 17 00:00:00 2001 From: "docsalot-app[bot]" <207601912+docsalot-app[bot]@users.noreply.github.com> Date: Sat, 27 Sep 2025 17:09:34 +0000 Subject: [PATCH 10/18] docs: sync configuration/GCP.mdx with latest code --- configuration/GCP.mdx | 26 +++++++++++++++++++++++++- 1 file changed, 25 insertions(+), 1 deletion(-) diff --git a/configuration/GCP.mdx b/configuration/GCP.mdx index c9cd369..dbb06e6 100644 --- a/configuration/GCP.mdx +++ b/configuration/GCP.mdx @@ -1,6 +1,7 @@ --- title: GCP --- + Visit [Google Cloud Console](https://cloud.google.com/?hl=en) to create your account. @@ -35,4 +36,27 @@ Navigate to the APIs & Services on the dashboard and enable the Vertex AI API fo ![Enter image alt description](../Images/QrB_Image_11.png) - \ No newline at end of file + +Run the following command to let Magemaker set up the required environment variables and create a `.env` file for you: + +```bash +magemaker --cloud gcp +``` + +This step will ask for: +- `PROJECT_ID` – your GCP project ID +- `GCLOUD_REGION` – region where you plan to deploy Vertex AI models (e.g. `us-central1`) + +These values will be written to `.env` so subsequent CLI commands and the Python SDK can locate them automatically. + + + +If you prefer to use a service-account key instead of `gcloud auth application-default login`, create a JSON key for your service account and export: + +```bash +export GOOGLE_APPLICATION_CREDENTIALS=/path/to/your/key.json +``` + +Magemaker will pick up the credentials automatically when deploying or querying endpoints. + + From c882c5e2f933f0596167987940c5edb8ef3a3016 Mon Sep 17 00:00:00 2001 From: "docsalot-app[bot]" <207601912+docsalot-app[bot]@users.noreply.github.com> Date: Sat, 27 Sep 2025 17:09:36 +0000 Subject: [PATCH 11/18] docs: sync getting_started.md with latest code --- getting_started.md | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/getting_started.md b/getting_started.md index 0bc86fa..6c18f69 100644 --- a/getting_started.md +++ b/getting_started.md @@ -16,19 +16,19 @@ To get a local copy up and running follow these simple steps. ### Prerequisites -* Python 3.11 (3.13 is not supported because of azure) +* Python 3.11+ (Python 3.12 is currently not supported; Python 3.13 is not supported by Azure SDK) * Cloud Configuration * An account to your preferred cloud provider, AWS, GCP and Azure. * Each cloud requires slightly different accesses, Magemaker will guide you through getting the necessary credentials to the selected cloud provider * Here's a guide on how to configure AWS and get the credentials [Google Doc](https://docs.google.com/document/d/1NvA6uZmppsYzaOdkcgNTRl7Nb4LbpP9Koc4H_t5xNSg/edit?tab=t.0#heading=h.farbxuv3zrzm) * Quota approval for instances you require for the AI model - * By default, you get some free instances, example with AWS you are pre-approved for 2 ml.m5.xlarge instances with 16gb of RAM each + * By default, you get some free instances, example with AWS you are pre-approved for 2 `ml.m5.xlarge` instances with 16 GB of RAM each * An installation and configuration of your selected cloud CLI tool(s) * Magemaker will prompt you to install the CLI of the selected cloud provider, if not installed already. - * Magemaker will prompt you to add the necesssary credentials. + * Magemaker will prompt you to add the necessary credentials. -* Certain Hugging Face models (e.g. Llama2) require an access token ([hf docs](https://huggingface.co/docs/hub/en/models-gated#access-gated-models-as-a-user)) +* Certain Hugging Face models (e.g. Llama 2/3) require an access token ([HF docs](https://huggingface.co/docs/hub/en/models-gated#access-gated-models-as-a-user)) ## Installation @@ -45,9 +45,9 @@ To get a local copy up and running follow these simple steps. magemaker --cloud [aws|gcp|azure|all] ``` - If this is your first time running this command, It will configure the selected cloud so you’re ready to start deploying models. + If this is your first time running this command, it will configure the selected cloud so you’re ready to start deploying models. - In the case of AWS, it’ll prompt you to enter your Access Key and Secret. You can also specify your AWS region. The default is us-east-1. You only need to change this if your SageMaker instance quota is in a different region. + In the case of AWS, it’ll prompt you to enter your Access Key and Secret. You can also specify your AWS region. The default is `us-east-1`. You only need to change this if your SageMaker instance quota is in a different region. Once configured, it will create a `.env` file and save the credentials there. You can also add your Hugging Face Hub Token to this file if you have one. @@ -178,7 +178,6 @@ If you’re using the `ml.m5.xlarge` instance type, here are some small Hugging
- ## Deactivating Models Any model endpoints you spin up will run continuously unless you deactivate them! Make sure to delete endpoints you’re no longer using so you don’t keep getting charged for your SageMaker instance. From 4121be7121b013e58c5652cc2a3da0f04883810d Mon Sep 17 00:00:00 2001 From: "docsalot-app[bot]" <207601912+docsalot-app[bot]@users.noreply.github.com> Date: Sat, 27 Sep 2025 17:09:38 +0000 Subject: [PATCH 12/18] docs: sync installation.mdx with latest code --- installation.mdx | 51 ++++++++++++++++++++++++++---------------------- 1 file changed, 28 insertions(+), 23 deletions(-) diff --git a/installation.mdx b/installation.mdx index 1d843eb..08dc49b 100644 --- a/installation.mdx +++ b/installation.mdx @@ -3,24 +3,30 @@ title: Installation description: Configure Magemaker for your cloud provider --- - - For Macs, maxOS >= 13.6.6 is required. Apply Silicon devices (M1) must use Rosetta terminal. You can verify, your terminals architecture by running `arch`. It should print `i386` for Rosetta terminal. + • Python 3.11 is required (3.12 is currently not supported and 3.13 is blocked by an open Azure SDK issue). + • For Macs, macOS ≥ 13.6.6 is required. Apple-Silicon devices (M-series) must use a Rosetta terminal. Verify your terminal architecture by running `arch` — it should print `i386` when running under Rosetta. - Install via pip: ```sh pip install magemaker ``` +If you plan to run the optional OpenAI-compatible proxy server (`server.py`), make sure the extra dependencies are installed as well: + +```sh +pip install magemaker[proxy] +``` + +(See the proxy tutorial for details.) ## Cloud Account Setup ### AWS Configuration -- Follow this detailed guide for setting up AWS credentials: +- Follow this detailed guide for setting up AWS credentials: [AWS Setup Guide](/configuration/AWS) Once you have your AWS credentials, you can configure Magemaker by running: @@ -29,16 +35,14 @@ Once you have your AWS credentials, you can configure Magemaker by running: magemaker --cloud aws ``` -It will prompt you for aws credentials and set up the necessary configurations. - +It will prompt you for AWS credentials and set up the necessary configurations. ### GCP (Vertex AI) Configuration -- Follow this detailed guide for setting up GCP credentials: +- Follow this detailed guide for setting up GCP credentials: [GCP Setup Guide](/configuration/GCP) - -once you have your GCP credentials, you can configure Magemaker by running: +Once you have your GCP credentials, you can configure Magemaker by running: ```bash magemaker --cloud gcp @@ -46,9 +50,8 @@ magemaker --cloud gcp ### Azure Configuration -- Follow this detailed guide for setting up Azure credentials: - [GCP Setup Guide](/configuration/Azure) - +- Follow this detailed guide for setting up Azure credentials: + [Azure Setup Guide](/configuration/Azure) Once you have your Azure credentials, you can configure Magemaker by running: @@ -56,7 +59,6 @@ Once you have your Azure credentials, you can configure Magemaker by running: magemaker --cloud azure ``` - ### All three cloud providers If you have configured all three cloud providers, you can verify your configuration by running: @@ -65,15 +67,15 @@ If you have configured all three cloud providers, you can verify your configurat magemaker --cloud all ``` +--- ### Required Config File -By default, Magemaker will look for a `.env` file in your project root with the following variables based on which cloud provider(s) you plan to use: - +By default, Magemaker will look for a `.env` file in your project root with the following variables (created automatically the first time you run `magemaker --cloud `): ```bash # AWS Configuration AWS_ACCESS_KEY_ID="your-access-key" # Required for AWS -AWS_SECRET_ACCESS_KEY="your-secret-key" # Required for AWS +AWS_SECRET_ACCESS_KEY="your-secret-key" # Required for AWS SAGEMAKER_ROLE="arn:aws:iam::..." # Required for AWS # GCP Configuration @@ -87,13 +89,18 @@ AZURE_WORKSPACE_NAME="ml-workspace" # Required for Azure AZURE_REGION="eastus" # Required for Azure # Optional configurations -HUGGING_FACE_HUB_KEY="your-hf-token" # Required for gated HF models like llama +HUGGING_FACE_HUB_KEY="your-hf-token" # Required for gated HF models like Llama +AWS_REGION_NAME="us-east-1" # Override default AWS region +CONFIG_DIR=".magemaker_config" # Custom path for YAML configs generated by Magemaker ``` -Never commit your .env file to version control! +Never commit your `.env` file to version control! - For gated models like llama-3.1 from Meta, you might have to accept terms of use for model on hugging face and adding Hugging face token to the environment are necessary for deployment to go through. + For gated models such as Llama 3 from Meta, you must + 1. Accept the model licence on Hugging Face, and + 2. Add your Hugging Face token (`HUGGING_FACE_HUB_KEY`) to the `.env` file. + Deployments will fail without these steps. {/* ## Verification @@ -119,12 +126,10 @@ magemaker verify 3. **Security** - - Follow principle of least privilege + - Follow the principle of least privilege - Use service accounts where possible - Enable audit logging - - ## Troubleshooting Common configuration issues: @@ -142,7 +147,7 @@ Common configuration issues: - Confirm project ID 3. **Azure Issues** - - Check resource provider registration status: + - Check resource-provider registration status: ```bash az provider show -n Microsoft.MachineLearningServices az provider show -n Microsoft.ContainerRegistry From 4de6bd7e082c0c0a3fb0895c0691e492fe10ece5 Mon Sep 17 00:00:00 2001 From: "docsalot-app[bot]" <207601912+docsalot-app[bot]@users.noreply.github.com> Date: Sat, 27 Sep 2025 17:09:40 +0000 Subject: [PATCH 13/18] docs: sync mint.json with latest code --- mint.json | 33 +++++++++++++++++++++++++-------- 1 file changed, 25 insertions(+), 8 deletions(-) diff --git a/mint.json b/mint.json index ccb1843..234a752 100644 --- a/mint.json +++ b/mint.json @@ -38,16 +38,21 @@ "mode": "auto" }, "navigation": [ - { + { "group": "Getting Started", - "pages": ["about", "installation", "quick-start"] + "pages": [ + "about", + "installation", + "quick-start" + ] }, { "group": "Tutorials", "pages": [ "tutorials/deploying-llama-3-to-aws", "tutorials/deploying-llama-3-to-gcp", - "tutorials/deploying-llama-3-to-azure" + "tutorials/deploying-llama-3-to-azure", + "tutorials/openai-compatible-proxy" ] }, { @@ -77,17 +82,29 @@ { "title": "Documentation", "links": [ - { "label": "Getting Started", "url": "/" }, - { "label": "Contributing", "url": "/contributing" } + { + "label": "Getting Started", + "url": "/" + }, + { + "label": "Contributing", + "url": "/contributing" + } ] }, { "title": "Resources", "links": [ - { "label": "GitHub", "url": "https://github.com/slashml/magemaker" }, - { "label": "Support", "url": "mailto:support@slashml.com" } + { + "label": "GitHub", + "url": "https://github.com/slashml/magemaker" + }, + { + "label": "Support", + "url": "mailto:support@slashml.com" + } ] } ] } -} \ No newline at end of file +} From c7423112de78fb6c630fc595345410c857b3b00d Mon Sep 17 00:00:00 2001 From: "docsalot-app[bot]" <207601912+docsalot-app[bot]@users.noreply.github.com> Date: Sat, 27 Sep 2025 17:09:42 +0000 Subject: [PATCH 14/18] docs: sync tutorials/deploying-llama-3-to-aws.mdx with latest code --- tutorials/deploying-llama-3-to-aws.mdx | 1 - 1 file changed, 1 deletion(-) diff --git a/tutorials/deploying-llama-3-to-aws.mdx b/tutorials/deploying-llama-3-to-aws.mdx index 46f0659..ac4d865 100644 --- a/tutorials/deploying-llama-3-to-aws.mdx +++ b/tutorials/deploying-llama-3-to-aws.mdx @@ -107,4 +107,3 @@ if __name__ == "__main__": ``` ## Conclusion You have successfully deployed and queried Llama 3 on AWS SageMaker using Magemaker's interactive dropdown menu. For any questions or feedback, feel free to contact us at [support@slashml.com](mailto:support@slashml.com). - From d90a7bf17afdcabe50dd5c0c1bc855d097a0c5b3 Mon Sep 17 00:00:00 2001 From: "docsalot-app[bot]" <207601912+docsalot-app[bot]@users.noreply.github.com> Date: Sat, 27 Sep 2025 17:09:43 +0000 Subject: [PATCH 15/18] docs: sync tutorials/deploying-llama-3-to-azure.mdx with latest code --- tutorials/deploying-llama-3-to-azure.mdx | 177 ++++++++++++----------- 1 file changed, 91 insertions(+), 86 deletions(-) diff --git a/tutorials/deploying-llama-3-to-azure.mdx b/tutorials/deploying-llama-3-to-azure.mdx index 679ba23..e5cf30f 100644 --- a/tutorials/deploying-llama-3-to-azure.mdx +++ b/tutorials/deploying-llama-3-to-azure.mdx @@ -3,141 +3,146 @@ title: Deploying Llama 3 to Azure --- ## Introduction -This tutorial guides you through deploying Llama 3 to Azure ML platform using Magemaker and querying it using the interactive dropdown menu. Ensure you have followed the [installation](installation) steps before proceeding. +This tutorial guides you through deploying Llama 3 to Azure Machine Learning (Azure ML) using Magemaker and querying it through the interactive dropdown menu. Make sure you have already completed the [installation](installation) steps and the Azure-specific setup in [Configuration → Azure](/configuration/Azure). - -You may need to request a quota increase for specific machine types and GPUs in the region where you plan to deploy the model. Check your Azure quotas before proceeding. + + Llama 3 is a gated Hugging Face model. You must first accept the model’s terms of use on Hugging Face **and** set `HUGGING_FACE_HUB_KEY` in your `.env` file before deployment will succeed. - - The model ids for Azure are different from AWS and GCP. Make sure to use the one provided by Azure in the Azure Model Catalog. + + Large-GPU VM sizes such as `Standard_NC24ads_A100_v4` are usually **not enabled by default**. Submit a quota-increase request before starting the deployment, otherwise Magemaker will fail with an Azure quota error. + - To find the relevnt model id, follow the steps in the [quick start](For Azure ML) + + Model IDs on Azure **differ** from their Hugging Face IDs. You can copy the exact ID from the Azure ML *Model Catalog*. Follow the steps in the [Quick Start → Azure ML](quick-start#for-azure-ml) section to locate the correct ID. -## Step 1: Setting Up Magemaker for Azure +--- -Run the following command to configure Magemaker for Azure deployment: +## Step 1 Configure Magemaker for Azure +Run the CLI once to generate the required `.env` file and validate your Azure credentials: -```sh +```bash magemaker --cloud azure ``` -This initializes Magemaker with the necessary configurations for deploying models to Azure ML Studio. +This command will: +1. Prompt for your **Subscription ID**, **Resource Group**, **Workspace Name**, and **Region** (if they’re not already in the environment). +2. Create or update `.env` with the variables Azure helpers in Magemaker expect: + ```bash + AZURE_SUBSCRIPTION_ID="..." + AZURE_RESOURCE_GROUP="..." + AZURE_WORKSPACE_NAME="..." + AZURE_REGION="..." + ``` +3. Verify that the Azure ML CLI extension is installed (`az extension add -n ml`) and all necessary resource providers are registered. -## Step 2: YAML-based Deployment +--- -For reproducible deployments, use YAML configuration: +## Step 2 Prepare a YAML Deployment File +For reproducible deployments we strongly recommend YAML. Place the file anywhere (e.g. `.magemaker_config/llama3-azure.yaml`) and pass its path to `--deploy`. -```sh -magemaker --deploy .magemaker_config/your-model.yaml -``` - -Example YAML for Azure deployment: +Example YAML for an 8-B Llama 3 deployment: ```yaml deployment: !Deployment destination: azure - endpoint_name: llama3-endpoint - instance_count: 1 - instance_type: Standard_NC24ads_A100_v4 + endpoint_name: llama3-endpoint # any unique name + instance_count: 1 # replicas + instance_type: Standard_NC24ads_A100_v4 # see note below models: - !Model - id: meta-llama-meta-llama-3-8b-instruct - location: null - predict: null + id: meta-textgeneration-llama-3-8b-instruct # Azure Model Catalog ID source: huggingface task: text-generation - version: null ``` - - For gated models like llama from Meta, you have to accept terms of use for model on hugging face and adding Hugging face token to the environment are necessary for deployment to go through. - - -### Selecting an Appropriate Instance -For 8B parameter models, recommended instance types include: +Choosing an instance: +• `Standard_NC24ads_A100_v4` → fastest, A100 80 GB GPUs (recommended) +• `Standard_NC24s_v3` → cheaper, V100 16 GB GPUs (may OOM on long prompts) -- Standard_NC24ads_A100_v4 (optimal performance) -- Standard_NC24s_v3 (cost-effective option with V100) +If you need INT-8/Bits-and-Bytes quantization, add `quantization: bitsandbytes` under `deployment`. - -If you encounter quota issues, submit a quota increase request in the Azure console. In the search bar search for `Quotas` and select the subscription you are using. In the `provider` select `Machine Learning` and then select the relevant region for the quota increase - -![Azure Quota](../Images/quotas.png) - - -## Step 3: Querying the Deployed Model - -Once the deployment is complete, note down the endpoint id. +--- -You can use the interactive dropdown menu to quickly query the model. +## Step 3 Deploy with Magemaker -### Querying Models +```bash +magemaker --deploy .magemaker_config/llama3-azure.yaml +``` -From the dropdown, select `Query a Model Endpoint` to see the list of model endpoints. Press space to select the endpoint you want to query. Enter your query in the text box and press enter to get the response. +Magemaker will: +1. Upload a curated inference image to your Azure Container Registry (if not already present) +2. Create an online endpoint and deploy the model +3. Stream progress in the terminal – deployment can take 10-25 minutes for large models +4. Output the **endpoint name** once the model is live -![Query Endpoints](../Images/query-1.png) +--- -Or you can use the following code +## Step 4 Query the Endpoint +### Option A Interactive Dropdown +Run: +```bash +magemaker --cloud azure +``` +Choose **Query a Model Endpoint**, select your newly created endpoint, enter a prompt, and view the response directly in the CLI. -```python +### Option B Python Script +If you prefer code, use the helper below (also found in `magemaker.azure.query_endpoint`): +```python +from dotenv import dotenv_values from azure.identity import DefaultAzureCredential from azure.ai.ml import MLClient -from azure.mgmt.resource import ResourceManagementClient +import json, os -from dotenv import dotenv_values -import os - - -def query_azure_endpoint(endpoint_name, query): - # Initialize the ML client - subscription_id = dotenv_values(".env").get("AZURE_SUBSCRIPTION_ID") - resource_group = dotenv_values(".env").get("AZURE_RESOURCE_GROUP") - workspace_name = dotenv_values(".env").get("AZURE_WORKSPACE_NAME") +def query_azure_endpoint(endpoint_name: str, prompt: str): + cfg = dotenv_values('.env') credential = DefaultAzureCredential() ml_client = MLClient( - credential=credential, - subscription_id=subscription_id, - resource_group_name=resource_group, - workspace_name=workspace_name + credential = credential, + subscription_id = cfg["AZURE_SUBSCRIPTION_ID"], + resource_group_name = cfg["AZURE_RESOURCE_GROUP"], + workspace_name = cfg["AZURE_WORKSPACE_NAME"], ) - import json - - # Test data - test_data = { - "inputs": query - } - - # Save the test data to a temporary file - with open("test_request.json", "w") as f: - json.dump(test_data, f) + # write request body to a temp file because invoke() expects a file path + payload = {"inputs": prompt} + with open("tmp_req.json", "w") as f: + json.dump(payload, f) - # Get prediction - response = ml_client.online_endpoints.invoke( + resp = ml_client.online_endpoints.invoke( endpoint_name=endpoint_name, - request_file = 'test_request.json' + request_file="tmp_req.json", ) - print('Raw Response Content:', response) - # delete a file - os.remove("test_request.json") - return response - -endpoint_id = 'your-endpoint-id-here' - -input_text = 'What are you?' + os.remove("tmp_req.json") + print(resp) + return resp -resp = query_azure_endpoint(endpoint_id=endpoint_id, input_text=input_text) -print(resp) +# Example usage +if __name__ == "__main__": + response = query_azure_endpoint("llama3-endpoint", "What are you?") +``` + +--- +## Cleanup +Large GPU endpoints are **expensive**. When you’re done, delete the endpoint: + +```bash +magemaker --cloud azure # choose "Delete a Model Endpoint" from the menu ``` -## Conclusion -You have successfully deployed and queried Llama 3 on Azure using Magemaker's interactive dropdown menu. For any questions or feedback, feel free to contact us at [support@slashml.com](mailto:support@slashml.com). +or programmatically with: +```bash +magemaker --delete-endpoint llama3-endpoint # shortcut flag +``` + +--- +## Conclusion +You have successfully deployed and queried Llama 3 on Azure ML using Magemaker. If you hit any issues, open an issue on GitHub or reach us at [support@slashml.com](mailto:support@slashml.com). From 6c88ee7a286f67b35a2a03b38f628fed845e004d Mon Sep 17 00:00:00 2001 From: "docsalot-app[bot]" <207601912+docsalot-app[bot]@users.noreply.github.com> Date: Sat, 27 Sep 2025 17:09:45 +0000 Subject: [PATCH 16/18] docs: sync tutorials/deploying-llama-3-to-gcp.mdx with latest code --- tutorials/deploying-llama-3-to-gcp.mdx | 185 +++++++++++-------------- 1 file changed, 83 insertions(+), 102 deletions(-) diff --git a/tutorials/deploying-llama-3-to-gcp.mdx b/tutorials/deploying-llama-3-to-gcp.mdx index a94d616..f5c2b25 100644 --- a/tutorials/deploying-llama-3-to-gcp.mdx +++ b/tutorials/deploying-llama-3-to-gcp.mdx @@ -3,144 +3,125 @@ title: Deploying Llama 3 to GCP --- ## Introduction -This tutorial guides you through deploying Llama 3 to Google Cloud Platform (GCP) Vertex AI using Magemaker and querying it using the interactive dropdown menu. Ensure you have followed the [installation](installation) steps before proceeding. +This tutorial guides you through deploying **Llama-3** to Google Cloud Platform (GCP) **Vertex AI** using Magemaker, and then querying it with either the interactive dropdown menu or a short Python helper. Make sure you’ve completed the [installation](installation) steps and configured GCP with `magemaker --cloud gcp` before starting. - -You may need to request a quota increase for specific machine types and GPUs in the region where you plan to deploy the model. Check your GCP quotas before proceeding. + +You may need to request a **quota increase** for specific machine types and GPUs in the region where you plan to deploy the model. Check your GCP quotas before proceeding. -## Step 1: Setting Up Magemaker for GCP +--- -Run the following command to configure Magemaker for GCP Vertex AI deployment: +## Step 1 · Configure Magemaker for GCP +Run the following command to set up Magemaker for Vertex AI: -```sh +```bash magemaker --cloud gcp ``` -This initializes Magemaker with the necessary configurations for deploying models to Vertex AI. +This command will: +1. Verify that the **gcloud** CLI is installed and initialised. +2. Prompt you for your **Project ID** and preferred **region** (`GCLOUD_REGION`). +3. Create/append a `.env` file with the required keys: + ```bash + PROJECT_ID="your-project-id" + GCLOUD_REGION="us-central1" # or another region of your choice + ``` + +Once this step succeeds you are ready to deploy models. -## Step 2: YAML-based Deployment +--- -For reproducible deployments, use YAML configuration: +## Step 2 · YAML-based Deployment (recommended) +For reproducible deployments and CI/CD pipelines use a YAML configuration: -```sh -magemaker --deploy .magemaker_config/your-model.yaml +```bash +magemaker --deploy .magemaker_config/llama3-gcp.yaml ``` -Example YAML for GCP deployment: +Example YAML for deploying **Llama-3 8B-Instruct** to Vertex AI: ```yaml deployment: !Deployment destination: gcp endpoint_name: llama3-endpoint - accelerator_count: 1 - instance_type: n1-standard-8 - accelerator_type: NVIDIA_T4 - num_gpus: 1 - quantization: null + instance_type: n1-standard-8 # CPU host machine + accelerator_type: NVIDIA_T4 # GPU accelerator + accelerator_count: 1 # number of GPUs + # Optional auto-scaling settings (supported by Magemaker >= 0.3.0) + min_replica_count: 1 # default = 1 + max_replica_count: 3 # default = 1 (no scaling) models: - !Model id: meta-llama/Meta-Llama-3-8B-Instruct - location: null - predict: null source: huggingface - task: text-generation - version: null + task: text-generation # helps pre-configure the container ``` + - For gated models like llama from Meta, you have to accept terms of use for model on hugging face and adding Hugging face token to the environment are necessary for deployment to go through. +For **gated models** (e.g. Llama-3) you must ① accept the model licence on Hugging Face and ② add your `HUGGING_FACE_HUB_KEY` to `.env` so Vertex AI can download the weights. +### Instance-Type Guidelines +For Llama 3, typical configurations are: +- **n1-standard-8 + NVIDIA_T4** → cost-effective for experimentation +- **a2-highgpu-1g** (A100) → higher throughput / lower latency -### Selecting an Appropriate Instance -For Llama 3, a machine type such as `n1-standard-8` with an attached NVIDIA T4 GPU (`NVIDIA_T4`) is a suitable configuration for most use cases. Adjust the instance type and GPU based on your workload requirements. +If you encounter quota errors, request additional **GPU quota** under **IAM & Admin → Quotas** in the Cloud Console. - -If you encounter quota issues, submit a quota increase request in the GCP console under "IAM & Admin > Quotas" for the specific GPU type in your deployment region. - +--- -## Step 3: Querying the Deployed Model +## Step 3 · Query the Deployed Endpoint +After deployment finishes, Vertex AI prints the **endpoint ID** (a numeric string). You can interactively query it from Magemaker’s dropdown *or* programmatically via Python. -Once the deployment is complete, note down the endpoint id. +### Option A · Interactive CLI +1. Run `magemaker --cloud gcp` again. +2. Choose **“Query a Model Endpoint”**. +3. Select the newly created `llama3-endpoint` and type your prompt. -You can use the interactive dropdown menu to quickly query the model. +![Query Endpoints](../Images/query-1.png) -### Querying Models +### Option B · Python Helper +Use the helper already shipped with Magemaker: -From the dropdown, select `Query a Model Endpoint` to see the list of model endpoints. Press space to select the endpoint you want to query. Enter your query in the text box and press enter to get the response. +```python +from magemaker.gcp.query_endpoint import query_vertexai_endpoint_rest -![Query Endpoints](../Images/query-1.png) +endpoint_id = "YOUR_ENDPOINT_ID" # e.g. 1234567890123456789 +prompt = "What are you?" -Or you can use the following code: -```python -from google.cloud import aiplatform -from google.protobuf import json_format -from google.protobuf.struct_pb2 import Value -import json -from dotenv import dotenv_values - - -def query_vertexai_endpoint_rest( - endpoint_id: str, - input_text: str, - token_path: str = None -): - import google.auth - import google.auth.transport.requests - import requests - - # TODO: this will have to come from config files - project_id = dotenv_values('.env').get('PROJECT_ID') - location = dotenv_values('.env').get('GCLOUD_REGION') - - - # Get credentials - if token_path: - credentials, project = google.auth.load_credentials_from_file(token_path) - else: - credentials, project = google.auth.default() - - # Refresh token - auth_req = google.auth.transport.requests.Request() - credentials.refresh(auth_req) - - # Prepare headers and URL - headers = { - "Authorization": f"Bearer {credentials.token}", - "Content-Type": "application/json" - } - - url = f"https://{location}-aiplatform.googleapis.com/v1/projects/{project_id}/locations/{location}/endpoints/{endpoint_id}:predict" - - # Prepare payload - payload = { - "instances": [ - { - "inputs": input_text, - # TODO: this also needs to come from configs - "parameters": { - "max_new_tokens": 100, - "temperature": 0.7, - "top_p": 0.95 - } - } - ] - } - - # Make request - response = requests.post(url, headers=headers, json=payload) - print('Raw Response Content:', response.content.decode()) - - return response.json() - -endpoint_id="your-endpoint-id-here" - -input_text='What are you?"' -resp = query_vertexai_endpoint_rest(endpoint_id=endpoint_id, input_text=input_text) -print(resp) +response = query_vertexai_endpoint_rest(endpoint_id=endpoint_id, input_text=prompt) +print(response) ``` -## Conclusion -You have successfully deployed and queried Llama 3 on GCP Vertex AI using Magemaker's interactive dropdown menu. For any questions or feedback, feel free to contact us at [support@slashml.com](mailto:support@slashml.com). +Parameters: +- `endpoint_id` – numeric ID from the deployment log or Vertex AI console. +- `input_text` – your prompt. +- `token_path` *(optional)* – path to a **service-account JSON key file** if you prefer not to rely on `gcloud auth login`. + +The helper automatically reads `PROJECT_ID` and `GCLOUD_REGION` from `.env`, obtains an access token, and calls the Vertex AI REST API for you. + +--- +## Cleanup +Remember to shut down endpoints you no longer need: + +```bash +magemaker --cloud gcp # open interactive menu +# ➜ Delete a Model Endpoint +``` + +Or directly from code: + +```python +from magemaker.gcp.delete_model import delete_vertex_ai_model + +delete_vertex_ai_model(["llama3-endpoint"]) +``` + +Stopping unused endpoints avoids unexpected charges. + +--- + +## Conclusion +You have successfully deployed and queried **Llama 3** on GCP Vertex AI using Magemaker. Need help or have feedback? Reach us at [support@slashml.com](mailto:support@slashml.com) or join our Discord. Happy building! From f311e23c871e35a045aefcbdba99d4b63883b5e8 Mon Sep 17 00:00:00 2001 From: "docsalot-app[bot]" <207601912+docsalot-app[bot]@users.noreply.github.com> Date: Sat, 27 Sep 2025 17:09:47 +0000 Subject: [PATCH 17/18] docs: sync updated_readme.md with latest code --- updated_readme.md | 229 +++++++++++++++++++--------------------------- 1 file changed, 95 insertions(+), 134 deletions(-) diff --git a/updated_readme.md b/updated_readme.md index bcfc60b..a38a33f 100644 --- a/updated_readme.md +++ b/updated_readme.md @@ -1,13 +1,12 @@ -
-

Magemaker v0.1, by SlashML

+

Magemaker by SlashML

- Deploy open source AI models to AWS in minutes. + Deploy open-source AI models to AWS, GCP, and Azure in minutes.

@@ -19,7 +18,7 @@ Table of Contents
  1. - About Magemaker + About Magemaker
  2. Getting Started @@ -28,9 +27,10 @@
  3. Installation
  4. -
  5. Using Magemaker
  6. -
  7. What we're working on next
  8. -
  9. Known issues
  10. +
  11. Using Magemaker
  12. +
  13. Advanced Features
  14. +
  15. Roadmap
  16. +
  17. Known Issues
  18. Contributing
  19. License
  20. Contact
  21. @@ -39,93 +39,80 @@ ## About Magemaker -Magemaker is a Python tool that simplifies the process of deploying an open source AI model to your own cloud. Instead of spending hours digging through documentation to figure out how to get AWS working, Magemaker lets you deploy open source AI models directly from the command line. -Choose a model from Hugging Face or SageMaker, and Magemaker will spin up a SageMaker instance with a ready-to-query endpoint in minutes. +Magemaker is a Python CLI and SDK that lets you deploy Hugging Face (and SageMaker JumpStart) models to **AWS SageMaker, Google Cloud Vertex AI, or Azure Machine Learning** with a single command or YAML file. Skip hours of provider-specific setup—Magemaker provisions infrastructure, uploads your model, and returns a ready-to-query endpoint in minutes. - -
    +--- ## Getting Started -Magemaker works with AWS. Azure and GCP support are coming soon! - -To get a local copy up and running follow these simple steps. +Magemaker supports **all three major cloud providers**. You can configure one, two, or all three at any time. ### Prerequisites -* Python -* An AWS account -* Quota for AWS SageMaker instances (by default, you get 2 instances of ml.m5.xlarge for free) -* Certain Hugging Face models (e.g. Llama2) require an access token ([hf docs](https://huggingface.co/docs/hub/en/models-gated#access-gated-models-as-a-user)) - -### Configuration - -**Step 1: Set up AWS and SageMaker** - -To get started, you’ll need an AWS account which you can create at https://aws.amazon.com/. Then you’ll need to create access keys for SageMaker. - -We wrote up the steps in [Google Doc](https://docs.google.com/document/d/1NvA6uZmppsYzaOdkcgNTRl7Nb4LbpP9Koc4H_t5xNSg/edit?tab=t.0#heading=h.farbxuv3zrzm) as well. - - - -### Installing the package +* Python 3.11+ (Python 3.12 is currently unsupported; Python 3.13 unsupported for Azure, see Azure SDK issue) +* At least one cloud account: + * AWS account + AWS CLI (for SageMaker) + * GCP project + Google Cloud SDK (for Vertex AI) + * Azure subscription + Azure CLI (for Azure ML) +* Sufficient quotas for the instance types/GPU types you plan to use +* (Optional) Hugging Face token for gated models such as Llama 3 -**Step 1** +### Installation -```sh +```bash pip install magemaker ``` -**Step 2: Running magemaker** +### First-Time Configuration -Run it by simply doing the following: +Run the CLI with your desired provider—Magemaker will guide you through credential setup and generate a `.env` file automatically: -```sh -magemaker +```bash +# Configure one provider +magemaker --cloud aws # or gcp | azure + +# Or configure everything at once +magemaker --cloud all ``` -If this is your first time running this command. It will configure the AWS client so you’re ready to start deploying models. You’ll be prompted to enter your Access Key and Secret here. You can also specify your AWS region. The default is us-east-1. You only need to change this if your SageMaker instance quota is in a different region. +The generated `.env` contains keys such as `AWS_ACCESS_KEY_ID`, `PROJECT_ID`, `AZURE_SUBSCRIPTION_ID`, etc. **Never commit this file to version control.** -Once configured, it will create a `.env` file and save the credentials there. You can also add your Hugging Face Hub Token to this file if you have one. +If you have a Hugging Face token add it here as well: -```sh -HUGGING_FACE_HUB_KEY="KeyValueHere" +```dotenv +HUGGING_FACE_HUB_KEY="hf_..." ```

    (back to top)

    - - - -
    +--- ## Using Magemaker -### Deploying models from dropdown +### Interactive Menu -When you run `magemaker` comamnd it will give you an interactive menu to deploy models. You can choose from a dropdown of models to deploy. - -#### Deploying Hugging Face models -If you're deploying with Hugging Face, copy/paste the full model name from Hugging Face. For example, `google-bert/bert-base-uncased`. Note that you’ll need larger, more expensive instance types in order to run bigger models. It takes anywhere from 2 minutes (for smaller models) to 10+ minutes (for large models) to spin up the instance with your model. +```bash +magemaker --cloud [aws|gcp|azure|all] +``` -#### Deploying Sagemaker models -If you are deploying a Sagemaker model, select a framework and search from a model. If you a deploying a custom model, provide either a valid S3 path or a local path (and the tool will automatically upload it for you). Once deployed, we will generate a YAML file with the deployment and model in the `CONFIG_DIR=.magemaker_config` folder. You can modify the path to this folder by setting the `CONFIG_DIR` environment variable. +The interactive menu lets you: +* Deploy new endpoints +* List active endpoints +* Query or delete endpoints -#### Deploy using a yaml file -We recommend deploying through a yaml file for reproducability and IAC. From the cli, you can deploy a model without going through all the menus. You can even integrate us with your Github Actions to deploy on PR merge. Deploy via YAML files simply by passing the `--deploy` option with local path like so: +### YAML-Driven Deployment (CI/CD Friendly) -``` -magemaker --deploy .magemaker_config/bert-base-uncased.yaml +```bash +magemaker --deploy .magemaker_config/bert-uncased.yaml ``` -Following is a sample yaml file for deploying a model the same google bert model mentioned above: +Example AWS deployment file: ```yaml deployment: !Deployment destination: aws - # Endpoint name matches model_id for querying atm. - endpoint_name: test-bert-uncased + endpoint_name: bert-uncased-aws instance_count: 1 instance_type: ml.m5.xlarge @@ -135,122 +122,96 @@ models: source: huggingface ``` -Following is a yaml file for deploying a llama model from HF: +GCP example: + ```yaml deployment: !Deployment - destination: aws - endpoint_name: test-llama2-7b - instance_count: 1 - instance_type: ml.g5.12xlarge - num_gpus: 4 - # quantization: bitsandbytes + destination: gcp + endpoint_name: bert-uncased-gcp + instance_type: n1-standard-8 + accelerator_type: NVIDIA_TESLA_T4 + accelerator_count: 1 models: - !Model - id: meta-llama/Meta-Llama-3-8B-Instruct + id: google-bert/bert-base-uncased source: huggingface - predict: - temperature: 0.9 - top_p: 0.9 - top_k: 20 - max_new_tokens: 250 ``` -#### Fine-tuning a model using a yaml file - -You can also fine-tune a model using a yaml file, by using the `train` option in the command and passing path to the yaml file - -` -magemaker --train .magemaker_config/train-bert.yaml -` - -Here is an example yaml file for fine-tuning a hugging-face model: +Azure example: ```yaml -training: !Training - destination: aws - instance_type: ml.p3.2xlarge +deployment: !Deployment + destination: azure + endpoint_name: bert-uncased-azure instance_count: 1 - training_input_path: s3://jumpstart-cache-prod-us-east-1/training-datasets/tc/data.csv - hyperparameters: !Hyperparameters - epochs: 1 - per_device_train_batch_size: 32 - learning_rate: 0.01 + instance_type: Standard_DS3_v2 models: - !Model - id: meta-textgeneration-llama-3-8b-instruct + id: google-bert-bert-base-uncased # Azure IDs differ—check Azure Model Catalog source: huggingface ``` +### Fine-Tuning -
    -
    - -If you’re using the `ml.m5.xlarge` instance type, here are some small Hugging Face models that work great: -
    -
    - -**Model: [google-bert/bert-base-uncased](https://huggingface.co/google-bert/bert-base-uncased)** - -- **Type:** Fill Mask: tries to complete your sentence like Madlibs -- **Query format:** text string with `[MASK]` somewhere in it that you wish for the transformer to fill -- -
    -
    - -**Model: [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2)** +```bash +magemaker --train .magemaker_config/train-bert.yaml +``` +(See docs/fine-tuning for full YAML schema.) -- **Type:** Feature extraction: turns text into a 384d vector embedding for semantic search / clustering -- **Query format:** "*type out a sentence like this one.*" +### Cleaning Up -
    -
    +Endpoints accrue cloud charges until deleted. Use the interactive menu or: +```bash +magemaker --cloud aws # select "Delete a Model Endpoint" +``` -### Deactivating models +

    (back to top)

    -Any model endpoints you spin up will run continuously unless you deactivate them! Make sure to delete endpoints you’re no longer using so you don’t keep getting charged for your SageMaker instance. +--- +## Advanced Features -

    (back to top)

    +* **Multi-Cloud YAML** – Deploy the same model to multiple clouds from one config +* **SageMaker JumpStart Support** – Use `source: sagemaker` to deploy marketplace models +* **OpenAI-Compatible Proxy** – Run `python server.py` to expose any SageMaker endpoint via the OpenAI chat/completions API (see docs/tutorials/openai-compatible-proxy) +* **Config Directory Override** – Set `CONFIG_DIR` env var to change where Magemaker writes YAMLs +--- - -
    +## Roadmap -## What we're working on next -- [ ] More robust error handling for various edge cases -- [ ] Verbose logging -- [ ] Enabling / disabling autoscaling -- [ ] Deployment to Azure and GCP +- [ ] Enhanced error handling and verbose logging +- [ ] Autoscaling enable/disable flags +- [ ] Additional quantization options +- [ ] Expanded fine-tuning support on GCP & Azure -

    (back to top)

    +--- +## Known Issues - -
    +- Query utility currently supports text-based models only +- Deleting an endpoint may take a few minutes to propagate +- Deploying the exact same model within one minute can fail due to name collisions -## Known issues -- [ ] Querying within Magemaker currently only works with text-based model - doesn’t work with multimodal, image generation, etc. -- [ ] Deleting a model is not instant, it may show up briefly after it was queued for deletion -- [ ] Deploying the same model within the same minute will break +--- -

    (back to top)

    +## Contributing +We love contributions! See the [CONTRIBUTING.md](CONTRIBUTING.md) guide for details. - -
    +--- ## License -Distributed under the Apache 2.0 License. See `LICENSE` for more information. +Distributed under the Apache 2.0 License. See `LICENSE` for details. - -
    +--- ## Contact -You can reach us, faizan & jneid, at [support@slashml.com](mailto:support@slashml.com). +Questions or feedback? [support@slashml.com](mailto:support@slashml.com) or join our [Discord](https://discord.gg/SBQsD63d). -We’d love to hear from you! We’re excited to learn how we can make this more valuable for the community and welcome any and all feedback and suggestions. +

    (back to top)

    \ No newline at end of file From 7a67f4b1392edf7e68991cb67396f0303beecdbe Mon Sep 17 00:00:00 2001 From: "docsalot-app[bot]" <207601912+docsalot-app[bot]@users.noreply.github.com> Date: Sat, 27 Sep 2025 17:09:48 +0000 Subject: [PATCH 18/18] docs: create tutorials/openai-compatible-proxy.mdx --- tutorials/openai-compatible-proxy.mdx | 121 ++++++++++++++++++++++++++ 1 file changed, 121 insertions(+) create mode 100644 tutorials/openai-compatible-proxy.mdx diff --git a/tutorials/openai-compatible-proxy.mdx b/tutorials/openai-compatible-proxy.mdx new file mode 100644 index 0000000..f5aeb2a --- /dev/null +++ b/tutorials/openai-compatible-proxy.mdx @@ -0,0 +1,121 @@ +--- +title: OpenAI-Compatible Proxy Server +description: Serve your deployed endpoints behind an OpenAI-style REST interface +--- + +## Overview + +Magemaker ships with a lightweight FastAPI server (`server.py`) that allows you to expose any deployed endpoint (AWS, GCP or Azure) through an **OpenAI-compatible** REST interface. +This is handy when you want to: + +- Re-use existing tooling that expects the OpenAI API (LangChain, Llama-Index, etc.) +- Avoid writing cloud-specific SDK calls in your application code +- Run the server locally, in a container, or as a cloud function + +Once running, you can simply point your OpenAI client at `http://localhost:8000/v1/chat/completions` (or the host/port you specify) and interact with your private model as if it were the official OpenAI service. + + +The proxy is **stateless**—every request is forwarded to the underlying cloud endpoint you specify. Billing still happens on the cloud provider side. + + +--- + +## Quick Start + +1. Deploy a model with Magemaker (see the other tutorials for details) and copy the **endpoint name** that Magemaker prints when the deployment is finished. +2. Ensure the relevant cloud credentials are available in your `.env` file (same file Magemaker uses). +3. Start the proxy server: + +```bash +# From repository root +python server.py --endpoint YOUR_ENDPOINT_NAME --port 8000 +``` + +The server boots with Uvicorn and begins listening on the specified port (default: `8000`). + +--- + +## Making Requests + +Below is a minimal Python example using the official OpenAI client: + +```python +import openai + +# Point the client to your local proxy +openai.api_base = "http://localhost:8000/v1" +openai.api_key = "not-needed-but-required-by-sdk" + +response = openai.ChatCompletion.create( + model="YOUR_ENDPOINT_NAME", # This must match the endpoint you passed to the server + messages=[{"role": "user", "content": "Hello!"}] +) + +print(response.choices[0].message.content) +``` + +Any SDK or tool that speaks the official OpenAI REST spec should work (LangChain, llama-cpp-python, etc.). + +--- + +## Command-line Options + +```text +python server.py --help + +--endpoint Name of the deployed endpoint to proxy (required) +--port Port to bind the HTTP server (default: 8000) +--host Host interface (default: 0.0.0.0) +``` + +You can also set the endpoint via an environment variable: + +```bash +export MAGEMAKER_ENDPOINT_NAME=YOUR_ENDPOINT_NAME +python server.py +``` + +If neither the flag nor the environment variable is provided the server will raise an error at startup. + +--- + +## How It Works + +1. The FastAPI route `/v1/chat/completions` (and `/v1/completions` for legacy clients) parses the incoming JSON payload. +2. The request is converted into a `magemaker.schemas.Query` object. +3. Based on the destination cloud provider (auto-detected from the endpoint name) the appropriate query helper is executed: + - AWS → `magemaker.sagemaker.query_endpoint` + - GCP → `magemaker.gcp.query_endpoint` + - Azure → `magemaker.azure.query_endpoint` +4. The helper returns the raw model response, which the server then marshals into the OpenAI JSON schema before returning it to the caller. + +--- + +## Deployment Tips + +- **Docker** – The server has no system dependencies beyond Python. A minimal Dockerfile would simply install `magemaker[server]` and run `python server.py`. +- **Authentication** – Add a reverse proxy (e.g., Nginx, Cloud Run "auth" middleware, API Gateway) in front of the server if you need API keys or OAuth. +- **Scaling** – Because the proxy is stateless you can run multiple replicas behind a load balancer as long as they share the same `.env` credentials. + + +Do **not** expose the proxy publicly without proper authentication—anyone with URL access can consume your cloud resources. + + +--- + +## Troubleshooting + +1. **401 / Permission errors** + Make sure the `.env` file used by the server contains valid cloud credentials. +2. **Invalid model / endpoint** + Confirm the `--endpoint` value matches exactly the name returned by Magemaker after deployment. +3. **CORS errors** + The FastAPI instance is started without CORS headers. If you are calling the proxy from a browser, enable `CORSMiddleware` in `server.py`. + +--- + +## Next Steps + +- Integrate the proxy in your LangChain or Llama-Index pipelines. +- Containerize and deploy to AWS ECS, Cloud Run, or Azure Container Apps. +- Contribute! If you need additional OpenAI endpoints (embeddings, images, etc.) open an issue or PR.