From 2eb3b1cebce68278b941ed2a07607b216f6c5243 Mon Sep 17 00:00:00 2001 From: "docsalot-app[bot]" <207601912+docsalot-app[bot]@users.noreply.github.com> Date: Thu, 25 Sep 2025 18:52:42 +0000 Subject: [PATCH 01/18] docs: sync CONTRIBUTING.md with latest code --- CONTRIBUTING.md | 95 ++++++++++++++++++++++++++++++++++++++----------- 1 file changed, 74 insertions(+), 21 deletions(-) diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index a70cce7..6205796 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -1,5 +1,5 @@ --- -title: CONTRIBUTING.md +title: CONTRIBUTING description: How to contribute to Magemaker "og:title": "Contributing to Magemaker" --- @@ -9,7 +9,7 @@ description: How to contribute to Magemaker We love your input! We want to make contributing to Magemaker as easy and transparent as possible, whether it's: - Reporting a bug -- Discussing the current state of the code +- Discussing the current state of the code or documentation - Submitting a fix - Proposing new features - Becoming a maintainer @@ -21,8 +21,8 @@ We love your input! We want to make contributing to Magemaker as easy and transp If you encounter any bugs or have feature requests: 1. Go to our GitHub repository -2. Click on "Issues" -3. Click "New Issue" +2. Click on **Issues** +3. Click **New Issue** 4. Choose the appropriate template (Bug Report or Feature Request) 5. Fill out the template with as much detail as possible @@ -33,31 +33,84 @@ If you encounter any bugs or have feature requests: ### 2. Submit Pull Requests 1. Fork the repo and create your branch from `main` -2. If you've added code that should be tested, add tests -3. If you've changed APIs, update the documentation -4. Ensure the test suite passes -5. Make sure your code lints -6. Issue that pull request! +2. If you've added code that should be tested, add tests (`pytest`) +3. If you've changed APIs or behaviour, update the documentation (`.mdx` files) +4. Ensure the test suite passes: `pytest` +5. Make sure your code lints: `black . && isort . && flake8` +6. Open that pull request! Create a Pull Request to propose and collaborate on changes to a repository. -## Development Process - -1. Fork the repo -2. Create a new branch: `git checkout -b my-feature-branch` -3. Make your changes -4. Push to your fork and submit a pull request -5. Wait for a review and address any comments +## Local Development Workflow + + + + ```bash + git clone https://github.com/slashml/magemaker.git + cd magemaker + # Install core + dev dependencies + pip install -e ".[dev]" + ``` + + + ```bash + pytest -q + ``` + + + ```bash + uvicorn server:app --reload --port 8000 + ``` + The server exposes: + - `GET /endpoint/{endpoint_name}` – Returns SageMaker endpoint metadata + - `POST /endpoint/{endpoint_name}/query` – Sends an inference request + - `POST /chat/completions` – OpenAI-compatible chat completion proxy + + + Install Mintlify CLI once: + ```bash + npm i -g mintlify + ``` + Then run at repo root: + ```bash + mintlify dev + ``` + + ## Pull Request Guidelines -- Update documentation as needed -- Add tests if applicable -- Follow the existing code style - Keep PRs small and focused -- Write clear commit messages +- Follow existing code style (`black`, `isort`, `flake8`) +- Include or update tests +- Update or create documentation pages for any new feature (see *Docs Style Guide* below) +- Write clear commit messages (Conventional Commits preferred) + +### Docs Style Guide + +1. All docs live in `docs/` or a top-level `.mdx` file. +2. Use MDX components already present in other pages (e.g. ``, ``). +3. For a **brand-new feature**, create a new page under an appropriate folder (e.g. `concepts/`, `tutorials/`) and reference it in `mint.json` (navigation update will be reviewed during PR). +4. Screenshots should be placed in `docs/Images/` and referenced with a relative path. + +## Commit Message Convention + +We follow [Conventional Commits](https://www.conventionalcommits.org/): + +- `feat:` New feature +- `fix:` Bug fix +- `docs:` Documentation only +- `style:` Formatting, missing semicolons, etc. +- `refactor:` Code change that neither fixes a bug nor adds a feature +- `test:` Adding or correcting tests +- `chore:` Build process or auxiliary tools changes + +Example: +```bash +feat(api): add chat completions proxy endpoint +``` ## License @@ -65,4 +118,4 @@ By contributing, you agree that your contributions will be licensed under the Ap ## Questions? -Feel free to contact us at [support@slashml.com](mailto:support@slashml.com) if you have any questions about contributing! \ No newline at end of file +Feel free to contact us at [support@slashml.com](mailto:support@slashml.com) or join our [Discord](https://discord.gg/SBQsD63d) if you have any questions about contributing! From 8aabb84ef7596949b6932f355151b4900aeeabd1 Mon Sep 17 00:00:00 2001 From: "docsalot-app[bot]" <207601912+docsalot-app[bot]@users.noreply.github.com> Date: Thu, 25 Sep 2025 18:52:43 +0000 Subject: [PATCH 02/18] docs: sync README.md with latest code --- README.md | 68 ++++++++++++++++++++++++++++++++++++++++++++++--------- 1 file changed, 57 insertions(+), 11 deletions(-) diff --git a/README.md b/README.md index 862eb3c..2dc6b7a 100644 --- a/README.md +++ b/README.md @@ -1,24 +1,70 @@ -### These are docs for the [Magemaker-Docs](https://magemaker.slashml.com) documentation site. +### Magemaker Documentation Repo -The source code of magemaker is located at [Magemaker](https://github.com/slashml/magemaker) +This repository contains **only the documentation** for the Magemaker project. +The actual source code for Magemaker lives in the main repo: +[github.com/slashml/magemaker](https://github.com/slashml/magemaker). -### Development +--- +## Local Docs Development -Install the [Mintlify CLI](https://www.npmjs.com/package/mintlify) to preview the documentation changes locally. To install, use the following command +The docs are written in **Mintlify** (`.mdx` & `.md` files). +Follow the steps below to preview changes locally. -``` +### 1. Install the Mintlify CLI + +```bash npm i -g mintlify ``` -you need to have Node.js installed to use npm. +> You need Node.js ≥ 18.x for the command above to work. -Run the following command at the root of your documentation (where mint.json is) +### 2. Start the Docs Dev-Server -``` +From the root of the documentation folder (where `mint.json` lives): + +```bash mintlify dev ``` -#### Troubleshooting +Mintlify will build the site and serve it at `http://localhost:3000` with hot-reload. + +### 3. Troubleshooting + +• **`mintlify dev` isn’t running** – run `mintlify install` to re-install deps. +• **404 after reload** – ensure you’re inside the folder containing `mint.json`. +• **Port already in use** – pass a different port: `mintlify dev --port 4000`. + +--- +## Contributing to the Docs + +1. Fork the repo & create your branch: `git checkout -b docs/my-update` +2. Make your edits / add new pages (follow existing style & front-matter). +3. Preview with `mintlify dev`. +4. Open a Pull Request against `main`. + +### Writing Style + +• Use American English. +• Keep headings ≤ H3 where possible. +• Use ``, ``, ``, and `` components when helpful. +• All code blocks MUST be fenced with the correct language for syntax-highlighting. + +### When You Add New Features to Code + +If the implementation introduces a **new public-facing capability** (CLI flag, API route, env-var, etc.) you **must**: + +1. Create a new `.mdx` page that documents it. +2. Update `mint.json` navigation if needed. +3. Link to the new page from at least one existing doc (Quick-Start, Concepts, or Tutorials). + +--- +## NEW – OpenAI-Compatible Proxy Docs + +Magemaker now ships a **FastAPI server** (`server.py`) that exposes OpenAI-style chat completions on top of your SageMaker endpoints. +The full usage guide lives in the new page: +`/concepts/openai-proxy` – please read it and keep it updated if you touch `server.py`. + +--- +## Need Help? -- Mintlify dev isn't running - Run `mintlify install` it'll re-install dependencies. -- Page loads as a 404 - Make sure you are running in a folder with `mint.json` \ No newline at end of file +Open a discussion on GitHub or email us at **support@slashml.com**. From 5d30d62a8f7d0eab229d6cdc907d7e6e40384066 Mon Sep 17 00:00:00 2001 From: "docsalot-app[bot]" <207601912+docsalot-app[bot]@users.noreply.github.com> Date: Thu, 25 Sep 2025 18:52:45 +0000 Subject: [PATCH 03/18] docs: sync concepts/contributing.mdx with latest code --- concepts/contributing.mdx | 44 ++++++++++++++++++++++++++++++++++++--- 1 file changed, 41 insertions(+), 3 deletions(-) diff --git a/concepts/contributing.mdx b/concepts/contributing.mdx index 8c61908..f4a8381 100644 --- a/concepts/contributing.mdx +++ b/concepts/contributing.mdx @@ -45,6 +45,41 @@ We're excited that you're interested in contributing to Magemaker! This document +### Running the Local API Server (OpenAI-Compatible Proxy) + +The repository includes `server.py`, a FastAPI implementation that exposes an OpenAI-compatible `/v1/chat/completions` endpoint. + +```bash +# From the project root +uvicorn server:app --reload +``` + +Environment variables required for the proxy live in `.env` (created by Magemaker on first run). At minimum you need: + +```bash +AWS_ACCESS_KEY_ID=... +AWS_SECRET_ACCESS_KEY=... +# Optional +HUGGING_FACE_HUB_KEY=... +``` + +You can now hit `http://localhost:8000/v1/chat/completions` with any OpenAI SDK that supports a custom base-url. + + + Detailed usage, parameters, and troubleshooting are documented in OpenAI Proxy. + + +### Previewing Documentation + +We use [Mintlify](https://mintlify.com) for the docs site. Install the CLI to preview docs locally: + +```bash +npm i -g mintlify +mintlify dev +``` + +Edits to `.mdx` or `.md` files will hot-reload in the browser. + ## Development Guidelines ### Code Style @@ -72,13 +107,16 @@ Run tests locally: pytest tests/ ``` +If you add or modify the FastAPI proxy, include unit tests under `tests/` to cover the new behaviour. + ### Documentation When adding new features, please update the relevant documentation: -1. Update the README.md if needed +1. Update `README.md` if needed 2. Add/update docstrings for new functions/classes -3. Create/update relevant .mdx files in the docs directory +3. Create/update relevant `.mdx` files in the docs directory +4. If you introduce a brand-new component (e.g., a new cloud provider module), also create a dedicated docs page following the structure in `concepts/` or `tutorials/` ## Pull Request Process @@ -165,4 +203,4 @@ We are committed to providing a welcoming and inclusive experience for everyone. ## License -By contributing to Magemaker, you agree that your contributions will be licensed under the Apache 2.0 License. \ No newline at end of file +By contributing to Magemaker, you agree that your contributions will be licensed under the Apache 2.0 License. From 2b599d2cc312bccb2b55fd6fd8a02f622a20573b Mon Sep 17 00:00:00 2001 From: "docsalot-app[bot]" <207601912+docsalot-app[bot]@users.noreply.github.com> Date: Thu, 25 Sep 2025 18:52:46 +0000 Subject: [PATCH 04/18] docs: sync concepts/deployment.mdx with latest code --- concepts/deployment.mdx | 99 ++++++++++++++++++++--------------------- 1 file changed, 48 insertions(+), 51 deletions(-) diff --git a/concepts/deployment.mdx b/concepts/deployment.mdx index 66ca7a9..0335b6a 100644 --- a/concepts/deployment.mdx +++ b/concepts/deployment.mdx @@ -36,9 +36,12 @@ This is recommended for: - Infrastructure as Code (IaC) - Team collaborations +> **Tip** +> All examples below use the `!Deployment` and `!Model` tags that Magemaker registers with PyYAML. If you copy an example make sure those tags are present. + ## Multi-Cloud Deployment -Magemaker supports deployment to AWS SageMaker, GCP Vertex AI, and Azure ML. Here's how to deploy the same model (facebook/opt-125m) to different cloud providers: +Magemaker supports deployment to AWS SageMaker, GCP Vertex AI, and Azure ML. Here's how to deploy the same model (`facebook/opt-125m`) to different cloud providers. ### AWS (SageMaker) @@ -62,7 +65,7 @@ deployment: !Deployment destination: gcp endpoint_name: opt-125m-gcp instance_count: 1 - machine_type: n1-standard-4 + instance_type: n1-standard-4 # previously `machine_type` accelerator_type: NVIDIA_TESLA_T4 accelerator_count: 1 @@ -83,7 +86,7 @@ deployment: !Deployment models: - !Model - id: facebook-opt-125m + id: facebook--opt-125m # note the double hyphen in Azure IDs source: huggingface ``` @@ -112,7 +115,8 @@ deployment: !Deployment endpoint_name: test-llama3-8b instance_count: 1 instance_type: ml.g5.12xlarge - num_gpus: 4 + num_gpus: 4 # optional, only for GPU instances + quantization: null # optional: bitsandbytes | awq | null models: - !Model @@ -125,6 +129,10 @@ models: max_new_tokens: 250 ``` + +Both `num_gpus` and `quantization` are optional. Use them when deploying very large models that need multi-GPU inference or quantised weights. + + ## Cloud-Specific Instance Types ### AWS SageMaker Types @@ -133,39 +141,38 @@ Choose your instance type based on your model's requirements: - Good for smaller models like BERT-base - - 4 vCPU - - 16 GB Memory - - Available in free tier + Good for smaller models like BERT-base + • 4 vCPU + • 16 GB Memory + • Available in free tier - Required for larger models like LLaMA - - 48 vCPU - - 192 GB Memory - - 4 NVIDIA A10G GPUs + Required for larger models like LLaMA-3-8B + • 48 vCPU + • 192 GB Memory + • 4 NVIDIA A10G GPUs - Remember to deactivate unused endpoints to avoid unnecessary charges! +Remember to deactivate unused endpoints to avoid unnecessary charges! ### GCP Vertex AI Types - Good for smaller models - - 4 vCPU - - 15 GB Memory - - Cost-effective option + Good for smaller models + • 4 vCPU + • 15 GB Memory - For larger models - - 12 vCPU - - 85 GB Memory - - 1 NVIDIA A100 GPU + For larger models + • 12 vCPU + • 85 GB Memory + • 1 NVIDIA A100 GPU @@ -173,56 +180,46 @@ Choose your instance type based on your model's requirements: - Good for smaller models - - 4 vCPU - - 14 GB Memory - - Balanced performance + Good for smaller models + • 4 vCPU + • 14 GB Memory - For GPU workloads - - 6 vCPU - - 112 GB Memory - - 1 NVIDIA V100 GPU + For GPU workloads + • 6 vCPU + • 112 GB Memory + • 1 NVIDIA V100 GPU ## Deployment Best Practices 1. Use meaningful endpoint names that include: - - Model name/version - Environment (dev/staging/prod) - Team identifier - 2. Start with smaller instance types and scale up as needed - 3. Always version your YAML configurations - 4. Set up monitoring and alerting for your endpoints -Make sure you setup budget monitory and alerts to avoid unexpected charges. +Make sure you set up budget monitoring and alerts to avoid unexpected charges. - ## Troubleshooting Deployments Common issues and their solutions: -1. **Deployment Timeout** - - - Check instance quota limits - - Verify network connectivity - -2. **Instance Not Available** - - - Try a different region - - Request quota increase - - Use an alternative instance type - -3. **Model Loading Failure** - - Verify model ID and version - - Check instance memory requirements - - Validate Hugging Face token if required - - Endpoing deployed but deployment failed. Check the logs, and do report this to us if you see this issue. +1. **Deployment Timeout** + • Check instance quota limits + • Verify network connectivity +2. **Instance Not Available** + • Try a different region + • Request quota increase + • Use an alternative instance type +3. **Model Loading Failure** + • Verify model ID and version + • Check instance memory requirements + • Validate Hugging Face token if required + • Endpoint deployed but model failed to load—check logs and report the issue if it persists. From 3f378fd7284b9386e5e5fe30abdc7fb5b64ee454 Mon Sep 17 00:00:00 2001 From: "docsalot-app[bot]" <207601912+docsalot-app[bot]@users.noreply.github.com> Date: Thu, 25 Sep 2025 18:52:47 +0000 Subject: [PATCH 05/18] docs: sync concepts/fine-tuning.mdx with latest code --- concepts/fine-tuning.mdx | 54 ++++++++++++++++++++++++++-------------- 1 file changed, 35 insertions(+), 19 deletions(-) diff --git a/concepts/fine-tuning.mdx b/concepts/fine-tuning.mdx index 88835aa..33e40a0 100644 --- a/concepts/fine-tuning.mdx +++ b/concepts/fine-tuning.mdx @@ -5,7 +5,7 @@ description: Guide to fine-tuning models with Magemaker ## Fine-tuning Overview -Fine-tuning allows you to adapt pre-trained models to your specific use case. Magemaker simplifies this process through YAML configuration. +Fine-tuning allows you to adapt pre-trained models to your specific use case. **At the moment, Magemaker supports fine-tuning _only on AWS SageMaker_** – GCP and Azure support are on the roadmap. ### Basic Command @@ -13,13 +13,17 @@ Fine-tuning allows you to adapt pre-trained models to your specific use case. Ma magemaker --train .magemaker_config/train-config.yaml ``` + + The `destination` field must be set to `aws` for now. Attempts to use `gcp` or `azure` will raise a validation error. + + ## Configuration ### Basic Training Configuration ```yaml training: !Training - destination: aws + destination: aws # currently only "aws" is supported instance_type: ml.p3.2xlarge instance_count: 1 training_input_path: s3://your-bucket/training-data.csv @@ -32,6 +36,8 @@ models: ### Advanced Configuration +The `Training` schema accepts an optional `hyperparameters` block. Any key/value pairs you specify here will be forwarded directly to the Hugging Face training script that Magemaker launches inside SageMaker. + ```yaml training: !Training destination: aws @@ -49,20 +55,24 @@ training: !Training save_steps: 1000 ``` + +If you omit the `hyperparameters` block, Magemaker will attempt to infer sensible defaults based on the model type (see `magemaker/sagemaker/fine_tune_model.py#get_hyperparameters_for_model`). + + ## Data Preparation ### Supported Formats - - Simple tabular data - - Easy to prepare + - Simple tabular data
+ - Easy to prepare
- Good for classification tasks
- - Flexible data format - - Good for complex inputs + - Flexible data format
+ - Good for complex inputs
- Supports nested structures
@@ -77,24 +87,22 @@ training: !Training Use AWS CLI or console to upload data - Specify S3 path in training configuration + Specify the S3 URI in `training_input_path`
## Instance Selection -### Training Instance Types - -Choose based on: -- Dataset size -- Model size -- Training time requirements -- Cost constraints +Choosing the right training instance affects both cost and training time. Popular choices: -- ml.p3.2xlarge (1 GPU) -- ml.p3.8xlarge (4 GPUs) -- ml.p3.16xlarge (8 GPUs) +- **ml.p3.2xlarge** (1 × NVIDIA V100 GPU) +- **ml.p3.8xlarge** (4 × NVIDIA V100 GPUs) +- **ml.p3.16xlarge** (8 × NVIDIA V100 GPUs) + + +You may need a quota increase for the larger `p3` instance families. Check your AWS Service Quotas before launching a job. + ## Hyperparameter Tuning @@ -124,7 +132,15 @@ hyperparameters: !Hyperparameters ### CloudWatch Metrics -Available metrics: +When the training job starts, Magemaker streams logs to Amazon CloudWatch. Key metrics include: - Loss - Learning rate -- GPU utilization \ No newline at end of file +- GPU utilization + +You can access these metrics in the SageMaker console under **Training jobs → Metrics**. + +--- + +### Cleaning Up + +Training jobs automatically shut down when they finish, but the output model artefacts remain in the S3 bucket that SageMaker created for you. Delete them if you no longer need the checkpoint files to avoid storage charges. From 8f0a57086b79d5b3581a9473ecc3101df8c6a456 Mon Sep 17 00:00:00 2001 From: "docsalot-app[bot]" <207601912+docsalot-app[bot]@users.noreply.github.com> Date: Thu, 25 Sep 2025 18:52:48 +0000 Subject: [PATCH 06/18] docs: sync concepts/models.mdx with latest code --- concepts/models.mdx | 235 +++++++++++++++++--------------------------- 1 file changed, 92 insertions(+), 143 deletions(-) diff --git a/concepts/models.mdx b/concepts/models.mdx index 0161380..6ea3ae1 100644 --- a/concepts/models.mdx +++ b/concepts/models.mdx @@ -3,203 +3,152 @@ title: Models description: Guide to supported models and their requirements --- -## Supported Models +## Supported Model Sources - -Currently, Magemaker supports deployment of Hugging Face models only. Support for cloud provider marketplace models is coming soon! - - -### Hugging Face Models +Magemaker can deploy models from two primary sources today: - - - LLaMA - - BERT - - GPT-2 - - T5 + + The full public Hugging Face Hub (text, vision, audio, & multimodal models). - - - - Sentence Transformers - - CLIP - - DPR + + Pre-built & Marketplace models that ship with SageMaker (BERT, Stable Diffusion, Llama-2, etc.). -### Future Support + +GCP Model Garden and Azure ML Model Catalog support is **under active development**. Follow the progress or up-vote the request on our +Featurebase board. + -We plan to add support for the following model sources: +### Hugging Face Models - - Models from AWS Marketplace and SageMaker built-in algorithms - - - - Models from Vertex AI Model Garden and Foundation Models + + • LLaMA 2 / 3
+ • GPT-2 / GPT-J
+ • T5 / FLAN-T5
- - Models from Azure ML Model Catalog and Azure OpenAI + + • Sentence-Transformers
+ • CLIP
+ • DPR / Contriever
-## Model Requirements - -### Instance Type Recommendations by Cloud Provider - -#### AWS SageMaker -1. **Small Models** (ml.m5.xlarge) - ```yaml - instance_type: ml.m5.xlarge - ``` -2. **Medium Models** (ml.g4dn.xlarge) - ```yaml - instance_type: ml.g4dn.xlarge - ``` -3. **Large Models** (ml.g5.12xlarge) - ```yaml - instance_type: ml.g5.12xlarge - num_gpus: 4 - ``` - -#### GCP Vertex AI -1. **Small Models** (n1-standard-4) - ```yaml - machine_type: n1-standard-4 - ``` -2. **Medium Models** (n1-standard-8 + GPU) - ```yaml - machine_type: n1-standard-8 - accelerator_type: NVIDIA_TESLA_T4 - accelerator_count: 1 - ``` -3. **Large Models** (a2-highgpu-1g) - ```yaml - machine_type: a2-highgpu-1g - ``` - -#### Azure ML -1. **Small Models** (Standard_DS3_v2) - ```yaml - instance_type: Standard_DS3_v2 - ``` -2. **Medium Models** (Standard_NC6s_v3) - ```yaml - instance_type: Standard_NC6s_v3 - ``` -3. **Large Models** (Standard_ND40rs_v2) - ```yaml - instance_type: Standard_ND40rs_v2 - ``` -## Example Deployments - -### Example Hugging Face Model Deployment +### SageMaker JumpStart Models (New!) -Deploy the same Hugging Face model to different cloud providers: +You can now deploy any JumpStart model by selecting **“SageMaker JumpStart → Search Models”** in the interactive CLI or by specifying `source: sagemaker` in a YAML file. -AWS SageMaker: ```yaml models: - !Model - id: facebook/opt-125m - source: huggingface + id: tensorflow-ic-imagenet-inception-v3-classification-4 # JumpStart model ID + source: sagemaker + deployment: !Deployment destination: aws + endpoint_name: inception-v3-demo + instance_type: ml.g4dn.xlarge ``` -GCP Vertex AI: +See the dedicated page ➜ [JumpStart Model Search](/concepts/jumpstart-models) for a full walkthrough. + +--- + +## Instance-Type Recommendations + +### AWS SageMaker +1. **Small Models** – `ml.m5.xlarge` +2. **GPU Text/Embeddings** – `ml.g4dn.xlarge` +3. **Large LLMs (8 B+)** – `ml.g5.12xlarge` (`num_gpus: 4`) + +### GCP Vertex AI *(Hugging Face only today)* +1. **CPU Only** – `n1-standard-4` +2. **Small GPU** – `n1-standard-8 + NVIDIA_TESLA_T4` +3. **Large GPU** – `a2-highgpu-1g` + +### Azure ML *(Hugging Face only today)* +1. `Standard_DS3_v2` +2. `Standard_NC6s_v3` +3. `Standard_ND40rs_v2` + +--- + +## Example Deployments + +### Deploy a Hugging Face Model to GCP ```yaml models: - !Model id: facebook/opt-125m source: huggingface + deployment: !Deployment destination: gcp + endpoint_name: opt-125m-demo ``` -Azure ML: +### Deploy a JumpStart Model to AWS ```yaml models: - !Model - id: facebook-opt-125m - source: huggingface + id: pytorch-inference-stable-diffusion-2-1-base + source: sagemaker + deployment: !Deployment - destination: azure + destination: aws + instance_type: ml.g4dn.xlarge ``` - - The model ids for Azure are different from AWS and GCP. Make sure to use the one provided by Azure in the Azure Model Catalog. - - To find the relevnt model id, follow the following steps - - - Find the workpsace in the Azure portal and click on the studio url provided. Click on the `Model Catalog` on the left side bar - ![Azure ML Creation](../Images/workspace-studio.png) - - - - Select Hugging-Face from the collections list. The id of the model card is the id you need to use in the yaml file - ![Azure ML Creation](../Images/hugging-face.png) - - - - - - -## Model Configuration - -### Basic Parameters - +### Deploy a Hugging Face Model to Azure ```yaml models: - !Model - id: your-model-id - source: huggingface|sagemaker # we don't support vertex and azure specific models yet - revision: latest # Optional: specify model version + id: facebook-opt-125m # Use Azure-specific model IDs! + source: huggingface + +deployment: !Deployment + destination: azure + instance_type: Standard_DS3_v2 ``` -### Advanced Parameters + +Azure model IDs differ from Hugging Face IDs. Use the **Model Catalog** in Azure ML Studio to copy the correct ID. + + +--- + +## Model Schema Reference ```yaml models: - !Model - id: your-model-id - source: huggingface - predict: + id: your-model-id # Required + source: huggingface|sagemaker # Required + revision: latest # Optional – model version / git SHA + predict: # Optional – generation parameters temperature: 0.7 top_p: 0.9 - top_k: 50 - max_new_tokens: 500 - do_sample: true + max_new_tokens: 256 ``` -## Best Practices - -1. **Model Selection** - - Compare pricing across cloud providers - - Consider data residency requirements - - Test latency from different regions - -3. **Cost Management** - - Compare instance pricing - - Make sure you set up the relevant alerting +--- -## Troubleshooting +## Best Practices +1. Benchmark cost vs latency across clouds before committing. +2. Start small (`ml.m5.xlarge`, `n1-standard-4`, `Standard_DS3_v2`) then scale up. +3. Always shut down endpoints when not in use to avoid surprises. -Common model-related issues: +--- -1. **Cloud-Specific Issues** - - Check quota limits - - Verify regional availability - - Review cloud-specific logs +## Troubleshooting Checklist -2. **Performance Issues** - - Compare cross-cloud latencies - - Check network connectivity - - Monitor resource utilization +1. **Quota / Capacity Errors** – request a quota increase or switch regions. +2. **Model Load Failure** – verify model ID + revision; ensure enough RAM/GPU. +3. **Auth Errors** – check cloud credentials & Hugging Face access tokens. -3. **Authentication Issues** - - Verify cloud credentials - - Check model access permissions - - Validate API keys \ No newline at end of file + +Stuck? Reach us on Discord or open a GitHub issue – we’re happy to help. + From bcaa1991c2b9a80cd659402f8b197fcb6af4b843 Mon Sep 17 00:00:00 2001 From: "docsalot-app[bot]" <207601912+docsalot-app[bot]@users.noreply.github.com> Date: Thu, 25 Sep 2025 18:52:49 +0000 Subject: [PATCH 07/18] docs: sync configuration/Azure.mdx with latest code --- configuration/Azure.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/configuration/Azure.mdx b/configuration/Azure.mdx index 3c1104a..85531e8 100644 --- a/configuration/Azure.mdx +++ b/configuration/Azure.mdx @@ -81,4 +81,4 @@ From the Azure portal - + \ No newline at end of file From 55730904044a52e2e2a8c7fde16f88f9dc022876 Mon Sep 17 00:00:00 2001 From: "docsalot-app[bot]" <207601912+docsalot-app[bot]@users.noreply.github.com> Date: Thu, 25 Sep 2025 18:52:50 +0000 Subject: [PATCH 08/18] docs: sync configuration/Environment.mdx with latest code --- configuration/Environment.mdx | 80 +++++++++++++++++++++++++++++------ 1 file changed, 67 insertions(+), 13 deletions(-) diff --git a/configuration/Environment.mdx b/configuration/Environment.mdx index 0781ec3..b6c32d5 100644 --- a/configuration/Environment.mdx +++ b/configuration/Environment.mdx @@ -3,28 +3,82 @@ title: Environment Variables --- ### Required Config File -A `.env` file is automatically created when you run `magemaker --cloud `. This file contains the necessary environment variables for your cloud provider(s). +A `.env` file is automatically created when you run `magemaker --cloud `. This file contains the necessary environment variables for your cloud provider(s) **and for optional components such as the OpenAI-compatible proxy server**. -By default, Magemaker will look for a `.env` file in your project root with the following variables based on which cloud provider(s) you plan to use: +By default, Magemaker will look for a `.env` file in your project root with the following variables. Only set the variables that apply to the parts of the product you are using—everything else can be omitted. ```bash +# ──────────────────────────── # AWS Configuration -AWS_ACCESS_KEY_ID="your-access-key" # Required for AWS +# ──────────────────────────── +AWS_ACCESS_KEY_ID="your-access-key" # Required for AWS AWS_SECRET_ACCESS_KEY="your-secret-key" # Required for AWS -SAGEMAKER_ROLE="arn:aws:iam::..." # Required for AWS +SAGEMAKER_ROLE="arn:aws:iam::..." # Required for AWS +# (Optional) Override the AWS region used by the CLI and the proxy server +AWS_REGION_NAME="us-east-1" +# ──────────────────────────── # GCP Configuration -PROJECT_ID="your-project-id" # Required for GCP -GCLOUD_REGION="us-central1" # Required for GCP +# ──────────────────────────── +PROJECT_ID="your-project-id" # Required for GCP +GCLOUD_REGION="us-central1" # Required for GCP +# Service-account key file for local auth (Vertex AI & tests) +GOOGLE_APPLICATION_CREDENTIALS="/path/to/key.json" # Optional but recommended +# ──────────────────────────── # Azure Configuration -AZURE_SUBSCRIPTION_ID="your-sub-id" # Required for Azure -AZURE_RESOURCE_GROUP="ml-resources" # Required for Azure -AZURE_WORKSPACE_NAME="ml-workspace" # Required for Azure -AZURE_REGION="eastus" # Required for Azure +# ──────────────────────────── +AZURE_SUBSCRIPTION_ID="your-sub-id" # Required for Azure +AZURE_RESOURCE_GROUP="ml-resources" # Required for Azure +AZURE_WORKSPACE_NAME="ml-workspace" # Required for Azure +AZURE_REGION="eastus" # Required for Azure -# Optional configurations -HUGGING_FACE_HUB_KEY="your-hf-token" # Required for gated HF models like llama +# ──────────────────────────── +# Hugging Face +# ──────────────────────────── +HUGGING_FACE_HUB_KEY="your-hf-token" # Needed for gated models (e.g. Llama-3) + +# ──────────────────────────── +# Magemaker Internals +# ──────────────────────────── +# Where Magemaker writes generated YAML configs +CONFIG_DIR=".magemaker_configs" # Optional – default shown + +# ──────────────────────────── +# OpenAI-Compatible Proxy Server (server.py) +# ──────────────────────────── +# Port that FastAPI/uvicorn should listen on (defaults to 8000 if unset) +PROXY_SERVER_PORT="8000" # Optional ``` -Never commit your .env file to version control! +Never commit your `.env` file or any credential files to version control! + +### How Environment Variables Are Consumed + + + + Credentials and region information are read at **runtime** when you deploy + or query a model. + + + `server.py` bootstraps a FastAPI server that proxies OpenAI-style requests + to your SageMaker endpoints. It relies on the same `.env` file for cloud + credentials plus `PROXY_SERVER_PORT` for binding. + + + Several tests load variables like `GOOGLE_APPLICATION_CREDENTIALS` or the + Azure workspace settings. Ensure these are present before running + `pytest`. + + + All YAML files generated by the interactive CLI are written to + `CONFIG_DIR`. Keep this folder under version control if you want your + deployments treated as Infrastructure-as-Code (IaC). + + + + + If you change **`CONFIG_DIR`** after you have already deployed endpoints, move + the existing YAML files to the new directory so that Magemaker can still find + them when you query or delete endpoints. + From 26a89e155bd7ad3f31526eb04bfc9cf94dac8a83 Mon Sep 17 00:00:00 2001 From: "docsalot-app[bot]" <207601912+docsalot-app[bot]@users.noreply.github.com> Date: Thu, 25 Sep 2025 18:52:51 +0000 Subject: [PATCH 09/18] docs: sync configuration/GCP.mdx with latest code --- configuration/GCP.mdx | 105 +++++++++++++++++++++++++++++++----------- 1 file changed, 79 insertions(+), 26 deletions(-) diff --git a/configuration/GCP.mdx b/configuration/GCP.mdx index c9cd369..57a1b3b 100644 --- a/configuration/GCP.mdx +++ b/configuration/GCP.mdx @@ -1,38 +1,91 @@ --- title: GCP +description: Configure Magemaker for Google Cloud Platform --- + + + Before proceeding, make sure you have followed the general installation guide and have Python 3.11+ installed. + + - -Visit [Google Cloud Console](https://cloud.google.com/?hl=en) to create your account. - + + Visit the Google Cloud Console to create your account. + + + + Once signed in, create a new project. If this is your first time, the default project is "My First Project". Click this dropdown and select New Project. + + ![Create GCP Project](../Images/google_new_project.png) + + + + 1. Follow the official Google Cloud SDK installation guide for your OS.
+ 2. Initialize the SDK and set the default project: + + ```bash + gcloud init + ``` + + 3. Verify Application-Default credentials (ADC): + + ```bash + gcloud auth application-default login + ``` + + This creates/refreshes an OAuth token that Magemaker will use via ADC when interacting with Vertex AI. +
+ + + In the Google Cloud Console, navigate to APIs & Services → Library and enable the Vertex AI API for your project. + + ![Enable Vertex AI](../Images/QrB_Image_11.png) + + + + If you prefer using a Service Account instead of ADC: + + 1. Create a Service Account with the roles Vertex AI Admin and Storage Admin (or least-privilege variants). + 2. Generate a JSON key file. + 3. Point the GOOGLE_APPLICATION_CREDENTIALS environment variable to that file: + + ```bash + export GOOGLE_APPLICATION_CREDENTIALS="/path/to/key.json" + ``` + + + + Run the setup command to generate a .env file and store required variables: - - Once you have created your account, create a new project. If this is your first time the default project is "My First Project". You can create a new project by clicking this button and then selecting "New Project". + ```bash + magemaker --cloud gcp + ``` - ![Enter image alt description](../Images/google_new_project.png) + You will be prompted for: + - PROJECT_ID – your GCP project ID + - GCLOUD_REGION – default Vertex AI region (e.g., us-central1) - + The resulting .env should contain at least: - -1. Follow the installation guide at [Google Cloud SDK Installation Documentation](https://cloud.google.com/sdk/docs/install-sdk) -2. Initialize the SDK by running: - ```bash - gcloud init - ``` - + ```bash + PROJECT_ID="your-project-id" + GCLOUD_REGION="us-central1" + # Optional – used when supplying a Service Account key file + GOOGLE_APPLICATION_CREDENTIALS="/absolute/path/to/key.json" + ``` -3. During initialization: - - Create login credentials when prompted - - Create a new project or select an existing one - To make sure the initialization worked, run: - ```bash - gcloud auth application-default login - ``` + Never commit your .env file to version control! + - -Navigate to the APIs & Services on the dashboard and enable the Vertex AI API for your project. + + List currently deployed Vertex AI endpoints to verify connectivity: -![Enter image alt description](../Images/QrB_Image_11.png) - + ```bash + magemaker --cloud gcp # open interactive menu + # Choose "Show Active Models" → you should see an empty list or existing endpoints + ``` + +
- \ No newline at end of file + + You may need to request a quota increase for specific machine types and GPUs in the region where you plan to deploy large models. Do this in IAM & Admin → Quotas. + From 157bd7049fbc841ebb34dac1da5816b8522142c6 Mon Sep 17 00:00:00 2001 From: "docsalot-app[bot]" <207601912+docsalot-app[bot]@users.noreply.github.com> Date: Thu, 25 Sep 2025 18:52:52 +0000 Subject: [PATCH 10/18] docs: sync getting_started.md with latest code --- getting_started.md | 19 +++++++++++-------- 1 file changed, 11 insertions(+), 8 deletions(-) diff --git a/getting_started.md b/getting_started.md index 0bc86fa..c74682e 100644 --- a/getting_started.md +++ b/getting_started.md @@ -16,7 +16,7 @@ To get a local copy up and running follow these simple steps. ### Prerequisites -* Python 3.11 (3.13 is not supported because of azure) +* Python 3.11+ (3.12 is not supported) * Cloud Configuration * An account to your preferred cloud provider, AWS, GCP and Azure. * Each cloud requires slightly different accesses, Magemaker will guide you through getting the necessary credentials to the selected cloud provider @@ -132,18 +132,18 @@ models: #### Fine-tuning a model using a yaml file -You can also fine-tune a model using a yaml file, by using the `train` option in the command and passing path to the yaml file +Fine-tuning is currently supported **only on AWS SageMaker**. You can fine-tune a model using a YAML file by passing the `train` option: -` +``` magemaker --train .magemaker_config/train-bert.yaml -` +``` -Here is an example yaml file for fine-tuning a hugging-face model: +Example YAML for fine-tuning a Hugging Face model on SageMaker: ```yaml training: !Training - destination: aws # or gcp, azure - instance_type: ml.p3.2xlarge # varies by cloud provider + destination: aws # Fine-tuning currently supports AWS only + instance_type: ml.p3.2xlarge instance_count: 1 training_input_path: s3://your-bucket/data.csv hyperparameters: !Hyperparameters @@ -151,6 +151,10 @@ training: !Training per_device_train_batch_size: 32 learning_rate: 2e-5 +models: +- !Model + id: google-bert/bert-base-uncased + source: huggingface ``` @@ -178,7 +182,6 @@ If you’re using the `ml.m5.xlarge` instance type, here are some small Hugging
- ## Deactivating Models Any model endpoints you spin up will run continuously unless you deactivate them! Make sure to delete endpoints you’re no longer using so you don’t keep getting charged for your SageMaker instance. From d0770f4966e76bc989cdafb4ba7c9f4631163915 Mon Sep 17 00:00:00 2001 From: "docsalot-app[bot]" <207601912+docsalot-app[bot]@users.noreply.github.com> Date: Thu, 25 Sep 2025 18:52:54 +0000 Subject: [PATCH 11/18] docs: sync installation.mdx with latest code --- installation.mdx | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/installation.mdx b/installation.mdx index 1d843eb..ced491b 100644 --- a/installation.mdx +++ b/installation.mdx @@ -47,7 +47,7 @@ magemaker --cloud gcp ### Azure Configuration - Follow this detailed guide for setting up Azure credentials: - [GCP Setup Guide](/configuration/Azure) + [Azure Setup Guide](/configuration/Azure) Once you have your Azure credentials, you can configure Magemaker by running: @@ -75,10 +75,12 @@ By default, Magemaker will look for a `.env` file in your project root with the AWS_ACCESS_KEY_ID="your-access-key" # Required for AWS AWS_SECRET_ACCESS_KEY="your-secret-key" # Required for AWS SAGEMAKER_ROLE="arn:aws:iam::..." # Required for AWS +AWS_REGION_NAME="us-east-1" # Optional: override default AWS region # GCP Configuration PROJECT_ID="your-project-id" # Required for GCP GCLOUD_REGION="us-central1" # Required for GCP +GOOGLE_APPLICATION_CREDENTIALS="/path/to/creds.json" # Optional: service-account json key # Azure Configuration AZURE_SUBSCRIPTION_ID="your-sub-id" # Required for Azure @@ -86,8 +88,10 @@ AZURE_RESOURCE_GROUP="ml-resources" # Required for Azure AZURE_WORKSPACE_NAME="ml-workspace" # Required for Azure AZURE_REGION="eastus" # Required for Azure -# Optional configurations -HUGGING_FACE_HUB_KEY="your-hf-token" # Required for gated HF models like llama +# Optional Magemaker configurations +HUGGING_FACE_HUB_KEY="your-hf-token" # Required for gated HF models like llama +CONFIG_DIR=".magemaker_config" # Optional: change default config directory +PROXY_SERVER_PORT="8000" # Optional: port for the OpenAI-compatible proxy server ``` Never commit your .env file to version control! From 03fad728ee6458c32ced7ebfe77d32435ea87be8 Mon Sep 17 00:00:00 2001 From: "docsalot-app[bot]" <207601912+docsalot-app[bot]@users.noreply.github.com> Date: Thu, 25 Sep 2025 18:52:56 +0000 Subject: [PATCH 12/18] docs: sync mint.json with latest code --- mint.json | 30 ++++++++++++++++++++++++------ 1 file changed, 24 insertions(+), 6 deletions(-) diff --git a/mint.json b/mint.json index ccb1843..020dde7 100644 --- a/mint.json +++ b/mint.json @@ -38,9 +38,13 @@ "mode": "auto" }, "navigation": [ - { + { "group": "Getting Started", - "pages": ["about", "installation", "quick-start"] + "pages": [ + "about", + "installation", + "quick-start" + ] }, { "group": "Tutorials", @@ -64,6 +68,8 @@ "pages": [ "concepts/deployment", "concepts/models", + "concepts/jumpstart-models", + "concepts/openai-proxy", "concepts/contributing" ] } @@ -77,15 +83,27 @@ { "title": "Documentation", "links": [ - { "label": "Getting Started", "url": "/" }, - { "label": "Contributing", "url": "/contributing" } + { + "label": "Getting Started", + "url": "/" + }, + { + "label": "Contributing", + "url": "/contributing" + } ] }, { "title": "Resources", "links": [ - { "label": "GitHub", "url": "https://github.com/slashml/magemaker" }, - { "label": "Support", "url": "mailto:support@slashml.com" } + { + "label": "GitHub", + "url": "https://github.com/slashml/magemaker" + }, + { + "label": "Support", + "url": "mailto:support@slashml.com" + } ] } ] From f3c61b79b6dffc5cc79133b14850996060454f7d Mon Sep 17 00:00:00 2001 From: "docsalot-app[bot]" <207601912+docsalot-app[bot]@users.noreply.github.com> Date: Thu, 25 Sep 2025 18:52:57 +0000 Subject: [PATCH 13/18] docs: sync tutorials/deploying-llama-3-to-aws.mdx with latest code --- tutorials/deploying-llama-3-to-aws.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tutorials/deploying-llama-3-to-aws.mdx b/tutorials/deploying-llama-3-to-aws.mdx index 46f0659..9dfce79 100644 --- a/tutorials/deploying-llama-3-to-aws.mdx +++ b/tutorials/deploying-llama-3-to-aws.mdx @@ -70,6 +70,7 @@ Or you can use the following code: from sagemaker.huggingface.model import HuggingFacePredictor import sagemaker + def query_huggingface_model(endpoint_name: str, query: str): # Initialize a SageMaker session sagemaker_session = sagemaker.Session() @@ -107,4 +108,3 @@ if __name__ == "__main__": ``` ## Conclusion You have successfully deployed and queried Llama 3 on AWS SageMaker using Magemaker's interactive dropdown menu. For any questions or feedback, feel free to contact us at [support@slashml.com](mailto:support@slashml.com). - From ecbea15e65ac11e51c7e1111d3779cdeca69bc8d Mon Sep 17 00:00:00 2001 From: "docsalot-app[bot]" <207601912+docsalot-app[bot]@users.noreply.github.com> Date: Thu, 25 Sep 2025 18:52:58 +0000 Subject: [PATCH 14/18] docs: sync tutorials/deploying-llama-3-to-azure.mdx with latest code --- tutorials/deploying-llama-3-to-azure.mdx | 49 +++++++++++------------- 1 file changed, 22 insertions(+), 27 deletions(-) diff --git a/tutorials/deploying-llama-3-to-azure.mdx b/tutorials/deploying-llama-3-to-azure.mdx index 679ba23..4d49452 100644 --- a/tutorials/deploying-llama-3-to-azure.mdx +++ b/tutorials/deploying-llama-3-to-azure.mdx @@ -86,17 +86,17 @@ Or you can use the following code from azure.identity import DefaultAzureCredential from azure.ai.ml import MLClient -from azure.mgmt.resource import ResourceManagementClient - from dotenv import dotenv_values import os +import json def query_azure_endpoint(endpoint_name, query): - # Initialize the ML client - subscription_id = dotenv_values(".env").get("AZURE_SUBSCRIPTION_ID") - resource_group = dotenv_values(".env").get("AZURE_RESOURCE_GROUP") - workspace_name = dotenv_values(".env").get("AZURE_WORKSPACE_NAME") + """Query an Azure ML online endpoint created by Magemaker.""" + + subscription_id = dotenv_values(".env").get("AZURE_SUBSCRIPTION_ID") + resource_group = dotenv_values(".env").get("AZURE_RESOURCE_GROUP") + workspace_name = dotenv_values(".env").get("AZURE_WORKSPACE_NAME") credential = DefaultAzureCredential() ml_client = MLClient( @@ -106,38 +106,33 @@ def query_azure_endpoint(endpoint_name, query): workspace_name=workspace_name ) - import json - - # Test data - test_data = { - "inputs": query - } + # Prepare the payload that the endpoint expects + payload = {"inputs": query} - # Save the test data to a temporary file - with open("test_request.json", "w") as f: - json.dump(test_data, f) + # Save payload to a temporary file (required by MLClient invoke API) + with open("tmp_request.json", "w") as f: + json.dump(payload, f) - # Get prediction + # Invoke the endpoint response = ml_client.online_endpoints.invoke( endpoint_name=endpoint_name, - request_file = 'test_request.json' + request_file="tmp_request.json" ) - print('Raw Response Content:', response) - # delete a file - os.remove("test_request.json") + print("Raw Response Content:", response) + + # Clean-up temp file + os.remove("tmp_request.json") return response - -endpoint_id = 'your-endpoint-id-here' - -input_text = 'What are you?' -resp = query_azure_endpoint(endpoint_id=endpoint_id, input_text=input_text) -print(resp) +if __name__ == "__main__": + endpoint_id = "your-endpoint-id-here" + input_text = "What are you?" + resp = query_azure_endpoint(endpoint_name=endpoint_id, query=input_text) + print(resp) ``` ## Conclusion You have successfully deployed and queried Llama 3 on Azure using Magemaker's interactive dropdown menu. For any questions or feedback, feel free to contact us at [support@slashml.com](mailto:support@slashml.com). - From a9cfb64491290f0861c303b48efcdf51ce754604 Mon Sep 17 00:00:00 2001 From: "docsalot-app[bot]" <207601912+docsalot-app[bot]@users.noreply.github.com> Date: Thu, 25 Sep 2025 18:53:00 +0000 Subject: [PATCH 15/18] docs: sync tutorials/deploying-llama-3-to-gcp.mdx with latest code --- tutorials/deploying-llama-3-to-gcp.mdx | 30 +++++++++++--------------- 1 file changed, 13 insertions(+), 17 deletions(-) diff --git a/tutorials/deploying-llama-3-to-gcp.mdx b/tutorials/deploying-llama-3-to-gcp.mdx index a94d616..23b51f3 100644 --- a/tutorials/deploying-llama-3-to-gcp.mdx +++ b/tutorials/deploying-llama-3-to-gcp.mdx @@ -35,7 +35,7 @@ deployment: !Deployment endpoint_name: llama3-endpoint accelerator_count: 1 instance_type: n1-standard-8 - accelerator_type: NVIDIA_T4 + accelerator_type: NVIDIA_TESLA_T4 # Updated to match Vertex AI enum num_gpus: 1 quantization: null @@ -49,12 +49,12 @@ models: version: null ``` - For gated models like llama from Meta, you have to accept terms of use for model on hugging face and adding Hugging face token to the environment are necessary for deployment to go through. + For gated models like Llama from Meta, you must accept the terms of use on Hugging Face and add your Hugging Face token to the environment for deployment to succeed. ### Selecting an Appropriate Instance -For Llama 3, a machine type such as `n1-standard-8` with an attached NVIDIA T4 GPU (`NVIDIA_T4`) is a suitable configuration for most use cases. Adjust the instance type and GPU based on your workload requirements. +For Llama 3, a machine type such as `n1-standard-8` with an attached NVIDIA T4 GPU (`NVIDIA_TESLA_T4`) is a suitable configuration for most use cases. Adjust the instance type and GPU based on your workload requirements. If you encounter quota issues, submit a quota increase request in the GCP console under "IAM & Admin > Quotas" for the specific GPU type in your deployment region. @@ -90,35 +90,33 @@ def query_vertexai_endpoint_rest( import google.auth.transport.requests import requests - # TODO: this will have to come from config files + # Project configuration from your .env file project_id = dotenv_values('.env').get('PROJECT_ID') location = dotenv_values('.env').get('GCLOUD_REGION') - # Get credentials if token_path: - credentials, project = google.auth.load_credentials_from_file(token_path) + credentials, _ = google.auth.load_credentials_from_file(token_path) else: - credentials, project = google.auth.default() - + credentials, _ = google.auth.default() + # Refresh token auth_req = google.auth.transport.requests.Request() credentials.refresh(auth_req) - + # Prepare headers and URL headers = { "Authorization": f"Bearer {credentials.token}", "Content-Type": "application/json" } - + url = f"https://{location}-aiplatform.googleapis.com/v1/projects/{project_id}/locations/{location}/endpoints/{endpoint_id}:predict" - + # Prepare payload payload = { "instances": [ { "inputs": input_text, - # TODO: this also needs to come from configs "parameters": { "max_new_tokens": 100, "temperature": 0.7, @@ -127,20 +125,18 @@ def query_vertexai_endpoint_rest( } ] } - + # Make request response = requests.post(url, headers=headers, json=payload) print('Raw Response Content:', response.content.decode()) return response.json() -endpoint_id="your-endpoint-id-here" - -input_text='What are you?"' +endpoint_id = "your-endpoint-id-here" +input_text = "What are you?" resp = query_vertexai_endpoint_rest(endpoint_id=endpoint_id, input_text=input_text) print(resp) ``` ## Conclusion You have successfully deployed and queried Llama 3 on GCP Vertex AI using Magemaker's interactive dropdown menu. For any questions or feedback, feel free to contact us at [support@slashml.com](mailto:support@slashml.com). - From 36c2b8234c0c95ba2c0597e3c763d3844d7bb675 Mon Sep 17 00:00:00 2001 From: "docsalot-app[bot]" <207601912+docsalot-app[bot]@users.noreply.github.com> Date: Thu, 25 Sep 2025 18:53:01 +0000 Subject: [PATCH 16/18] docs: sync updated_readme.md with latest code --- updated_readme.md | 246 +++++++++++++++++----------------------------- 1 file changed, 89 insertions(+), 157 deletions(-) diff --git a/updated_readme.md b/updated_readme.md index bcfc60b..20b9d2a 100644 --- a/updated_readme.md +++ b/updated_readme.md @@ -1,26 +1,23 @@ -
-

Magemaker v0.1, by SlashML

+

Magemaker by SlashML

- Deploy open source AI models to AWS in minutes. + Deploy open-source AI models to AWS, GCP, and Azure in minutes.
+ 📚 Full Documentation »

-
Table of Contents
    -
  1. - About Magemaker -
  2. +
  3. About Magemaker
  4. Getting Started
  5. -
  6. Using Magemaker
  7. -
  8. What we're working on next
  9. -
  10. Known issues
  11. +
  12. Using Magemaker
  13. +
  14. Fine-tuning
  15. +
  16. OpenAI-compatible Proxy
  17. +
  18. Roadmap
  19. +
  20. Known Issues
  21. Contributing
  22. License
  23. Contact
  24. @@ -39,218 +38,151 @@ ## About Magemaker -Magemaker is a Python tool that simplifies the process of deploying an open source AI model to your own cloud. Instead of spending hours digging through documentation to figure out how to get AWS working, Magemaker lets you deploy open source AI models directly from the command line. +Magemaker is a Python CLI and SDK that removes the pain of deploying open-source AI models to your own cloud. -Choose a model from Hugging Face or SageMaker, and Magemaker will spin up a SageMaker instance with a ready-to-query endpoint in minutes. +• Zero-to-production in minutes on **AWS SageMaker**, **GCP Vertex AI**, or **Azure ML** +• Supports Hugging Face models and AWS **JumpStart** models +• Interactive TUI for one-off deployments & management +• YAML workflow for reproducible IaC-style deployments +• Optional FastAPI **OpenAI-compatible proxy** for drop-in integration with existing OpenAI tooling - -
    - -## Getting Started +

    (back to top)

    -Magemaker works with AWS. Azure and GCP support are coming soon! +--- -To get a local copy up and running follow these simple steps. +## Getting Started +Magemaker works on macOS/Linux and supports Python 3.11 (3.12 currently blocked by Azure SDK). ### Prerequisites +* Python 3.11+ +* At least one cloud account (AWS, GCP, or Azure) +* Corresponding cloud CLI installed (`aws`, `gcloud`, or `az`) +* For gated Hugging Face models (e.g. Llama 3) a Hugging Face access token -* Python -* An AWS account -* Quota for AWS SageMaker instances (by default, you get 2 instances of ml.m5.xlarge for free) -* Certain Hugging Face models (e.g. Llama2) require an access token ([hf docs](https://huggingface.co/docs/hub/en/models-gated#access-gated-models-as-a-user)) - -### Configuration - -**Step 1: Set up AWS and SageMaker** - -To get started, you’ll need an AWS account which you can create at https://aws.amazon.com/. Then you’ll need to create access keys for SageMaker. - -We wrote up the steps in [Google Doc](https://docs.google.com/document/d/1NvA6uZmppsYzaOdkcgNTRl7Nb4LbpP9Koc4H_t5xNSg/edit?tab=t.0#heading=h.farbxuv3zrzm) as well. - - - -### Installing the package - -**Step 1** - -```sh +### Installation +```bash pip install magemaker ``` -**Step 2: Running magemaker** - -Run it by simply doing the following: - -```sh -magemaker -``` - -If this is your first time running this command. It will configure the AWS client so you’re ready to start deploying models. You’ll be prompted to enter your Access Key and Secret here. You can also specify your AWS region. The default is us-east-1. You only need to change this if your SageMaker instance quota is in a different region. - -Once configured, it will create a `.env` file and save the credentials there. You can also add your Hugging Face Hub Token to this file if you have one. - -```sh -HUGGING_FACE_HUB_KEY="KeyValueHere" +Run the CLI and pick your cloud: +```bash +magemaker --cloud [aws|gcp|azure|all] ``` +On first run, Magemaker walks you through credential setup and writes a `.env` file with the required variables (see [Environment Vars](https://magemaker.slashml.com/configuration/Environment)).

    (back to top)

    - - - -
    +--- ## Using Magemaker - -### Deploying models from dropdown - -When you run `magemaker` comamnd it will give you an interactive menu to deploy models. You can choose from a dropdown of models to deploy. - -#### Deploying Hugging Face models -If you're deploying with Hugging Face, copy/paste the full model name from Hugging Face. For example, `google-bert/bert-base-uncased`. Note that you’ll need larger, more expensive instance types in order to run bigger models. It takes anywhere from 2 minutes (for smaller models) to 10+ minutes (for large models) to spin up the instance with your model. - -#### Deploying Sagemaker models -If you are deploying a Sagemaker model, select a framework and search from a model. If you a deploying a custom model, provide either a valid S3 path or a local path (and the tool will automatically upload it for you). Once deployed, we will generate a YAML file with the deployment and model in the `CONFIG_DIR=.magemaker_config` folder. You can modify the path to this folder by setting the `CONFIG_DIR` environment variable. - -#### Deploy using a yaml file -We recommend deploying through a yaml file for reproducability and IAC. From the cli, you can deploy a model without going through all the menus. You can even integrate us with your Github Actions to deploy on PR merge. Deploy via YAML files simply by passing the `--deploy` option with local path like so: - +### 1. Interactive TUI +```bash +magemaker --cloud aws # or gcp / azure / all ``` +• Deploy models from dropdown +• List / query / delete endpoints +• Works cross-cloud + +### 2. YAML-based Deployment (CI-friendly) +```bash magemaker --deploy .magemaker_config/bert-base-uncased.yaml ``` - -Following is a sample yaml file for deploying a model the same google bert model mentioned above: - +Example YAML for AWS SageMaker: ```yaml deployment: !Deployment destination: aws - # Endpoint name matches model_id for querying atm. - endpoint_name: test-bert-uncased - instance_count: 1 + endpoint_name: bert-uncased-demo instance_type: ml.m5.xlarge - + instance_count: 1 models: - !Model id: google-bert/bert-base-uncased source: huggingface ``` +GCP & Azure use the same schema; just change `destination`, `instance_type`, and optional GPU fields (`accelerator_type`, `accelerator_count`, `num_gpus`). -Following is a yaml file for deploying a llama model from HF: -```yaml -deployment: !Deployment - destination: aws - endpoint_name: test-llama2-7b - instance_count: 1 - instance_type: ml.g5.12xlarge - num_gpus: 4 - # quantization: bitsandbytes +### 3. Deploying JumpStart Models +JumpStart models set `source: sagemaker` and can be discovered via the TUI. See the [JumpStart guide](https://magemaker.slashml.com/concepts/jumpstart-models). -models: -- !Model - id: meta-llama/Meta-Llama-3-8B-Instruct - source: huggingface - predict: - temperature: 0.9 - top_p: 0.9 - top_k: 20 - max_new_tokens: 250 +### 4. Deactivate / Delete +Endpoints accrue cost until deleted. Use the TUI option **Delete a Model Endpoint** or: +```bash +magemaker --delete my-endpoint-name ``` -#### Fine-tuning a model using a yaml file +

    (back to top)

    -You can also fine-tune a model using a yaml file, by using the `train` option in the command and passing path to the yaml file +--- -` +## Fine-tuning +Fine-tuning is currently supported on **AWS SageMaker**. +```bash magemaker --train .magemaker_config/train-bert.yaml -` - -Here is an example yaml file for fine-tuning a hugging-face model: - +``` +YAML snippet: ```yaml training: !Training destination: aws instance_type: ml.p3.2xlarge instance_count: 1 - training_input_path: s3://jumpstart-cache-prod-us-east-1/training-datasets/tc/data.csv + training_input_path: s3://your-bucket/data.csv hyperparameters: !Hyperparameters - epochs: 1 - per_device_train_batch_size: 32 - learning_rate: 0.01 - + epochs: 3 + learning_rate: 2e-5 models: - !Model - id: meta-textgeneration-llama-3-8b-instruct + id: google-bert/bert-base-uncased source: huggingface ``` +

    (back to top)

    -
    -
    - -If you’re using the `ml.m5.xlarge` instance type, here are some small Hugging Face models that work great: -
    -
    - -**Model: [google-bert/bert-base-uncased](https://huggingface.co/google-bert/bert-base-uncased)** - -- **Type:** Fill Mask: tries to complete your sentence like Madlibs -- **Query format:** text string with `[MASK]` somewhere in it that you wish for the transformer to fill -- -
    -
    - -**Model: [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2)** - -- **Type:** Feature extraction: turns text into a 384d vector embedding for semantic search / clustering -- **Query format:** "*type out a sentence like this one.*" - -
    -
    +--- +## OpenAI-compatible Proxy +Spin up a local FastAPI server that forwards `/v1/chat/completions` to any Magemaker endpoint, enabling seamless use with the OpenAI Python SDK. +```bash +python server.py # default port 8000 +``` +Set `PROXY_SERVER_PORT` in `.env` to override the port. Full docs: [OpenAI Proxy](https://magemaker.slashml.com/concepts/openai-proxy). -### Deactivating models +

    (back to top)

    -Any model endpoints you spin up will run continuously unless you deactivate them! Make sure to delete endpoints you’re no longer using so you don’t keep getting charged for your SageMaker instance. +--- +## Roadmap +- ✔️ Multi-cloud deployment (AWS, GCP, Azure) +- ✔️ JumpStart model support +- ⏳ Autoscaling controls +- ⏳ Enhanced logging & error handling +- ⏳ One-click cost-monitoring dashboards

    (back to top)

    +--- - -
    - -## What we're working on next -- [ ] More robust error handling for various edge cases -- [ ] Verbose logging -- [ ] Enabling / disabling autoscaling -- [ ] Deployment to Azure and GCP +## Known Issues +- Query helper functions currently text-only (no image/multimodal support) +- Endpoint deletion can take several minutes to reflect in the AWS console +- Deploying the same endpoint name within a single minute may cause conflicts

    (back to top)

    +--- - -
    - -## Known issues -- [ ] Querying within Magemaker currently only works with text-based model - doesn’t work with multimodal, image generation, etc. -- [ ] Deleting a model is not instant, it may show up briefly after it was queued for deletion -- [ ] Deploying the same model within the same minute will break +## Contributing +We 💜 contributions! Please read our [Contributing Guide](https://magemaker.slashml.com/concepts/contributing) for setup, coding standards, and docs preview with Mintlify.

    (back to top)

    - - -
    +--- ## License +Distributed under the Apache 2.0 License. See `LICENSE` for details. -Distributed under the Apache 2.0 License. See `LICENSE` for more information. - - -
    +--- ## Contact +Questions or feedback? Reach us at [support@slashml.com](mailto:support@slashml.com) or join our [Discord](https://discord.gg/SBQsD63d). -You can reach us, faizan & jneid, at [support@slashml.com](mailto:support@slashml.com). - -We’d love to hear from you! We’re excited to learn how we can make this more valuable for the community and welcome any and all feedback and suggestions. +

    (back to top)

    \ No newline at end of file From ac8e7d889acd652fe7e8ae8c9a5f304378d091fe Mon Sep 17 00:00:00 2001 From: "docsalot-app[bot]" <207601912+docsalot-app[bot]@users.noreply.github.com> Date: Thu, 25 Sep 2025 18:53:02 +0000 Subject: [PATCH 17/18] docs: create concepts/openai-proxy.mdx --- concepts/openai-proxy.mdx | 125 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 125 insertions(+) create mode 100644 concepts/openai-proxy.mdx diff --git a/concepts/openai-proxy.mdx b/concepts/openai-proxy.mdx new file mode 100644 index 0000000..81cd3b2 --- /dev/null +++ b/concepts/openai-proxy.mdx @@ -0,0 +1,125 @@ +--- +title: OpenAI-Compatible API Proxy +description: Use Magemaker as a drop-in replacement for the OpenAI Chat Completions API +--- + +## Overview + +Magemaker ships with a lightweight FastAPI server (`server.py`) that exposes three REST endpoints: + +| Method | Path | Description | +| ------ | ---- | ----------- | +| `GET` | `/endpoint/{endpoint_name}` | Returns metadata for an existing SageMaker endpoint | +| `POST` | `/endpoint/{endpoint_name}/query` | Sends an inference request to the specified endpoint | +| `POST` | `/chat/completions` | OpenAI-compatible **Chat Completions** endpoint powered by SageMaker + [LiteLLM](https://github.com/BerriAI/litellm) | + +Using these endpoints you can self-host an LLM and interact with it using the familiar OpenAI client libraries (or any tool expecting the official API). + + +The server currently supports **text-based** models deployed on SageMaker only. Multi-model endpoints are supported—Magemaker will pick the first match unless you pass a specific endpoint name. + + +## Quick Start + + + + ```bash + # Example: Deploy Meta-Llama-3-8B to SageMaker + magemaker --deploy .magemaker_config/llama3.yaml + ``` + + + ```bash + export AWS_REGION_NAME=us-east-1 # or your region + ``` + + + ```bash + uvicorn server:app --reload --port 8000 + ``` + The server is now listening on `http://localhost:8000`. + + + ```python + import openai + + openai.api_base = "http://localhost:8000" + openai.api_key = "not-needed" # kept for SDK parity + + chat_completion = openai.ChatCompletion.create( + model="meta-llama/Meta-Llama-3-8B-Instruct", # HF model id + messages=[{"role": "user", "content": "Hello!"}] + ) + + print(chat_completion.choices[0].message.content) + ``` + + + +## Request / Response Examples + +### `POST /chat/completions` + +```bash +curl -X POST http://localhost:8000/chat/completions \ + -H "Content-Type: application/json" \ + -d '{ + "model": "meta-llama/Meta-Llama-3-8B-Instruct", + "messages": [{"role": "user", "content": "Who are you?"}], + "temperature": 0.9 + }' +``` + +A successful response mirrors the OpenAI format: + +```json +{ + "id": "chatcmpl-abcd1234", + "object": "chat.completion", + "created": 1713456789, + "model": "sagemaker/llama3-endpoint", + "choices": [ + { + "index": 0, + "message": { + "role": "assistant", + "content": "I am an open-source language model hosted on SageMaker." + }, + "finish_reason": "stop" + } + ] +} +``` + +### Error Handling + +If the requested model is not deployed you will receive a `404` with the body `{"detail": "NotDeployedException"}`. + +## Environment Variables + +| Variable | Required | Description | +| -------- | -------- | ----------- | +| `AWS_REGION_NAME` | ✅ | AWS region where your SageMaker endpoints live | +| All others | | The server relies on the same `.env` file generated by `magemaker --cloud aws` | + +## Extending the Server + +1. **Additional Providers** – Add new query adapters (e.g. Vertex AI) in `server.py` and update the routing logic. +2. **Streaming Responses** – FastAPI supports async generators; wrap the SageMaker runtime call accordingly. +3. **Authentication** – Add an `APIKeyHeader` dependency to protect your endpoints in production. + +## Troubleshooting + +- **CORS Issues** – Use `FastAPI`'s `CORSMiddleware` if calling from a browser. +- **Model Not Found** – Ensure the `model` field in the request exactly matches the `id` used in your YAML deployment file. +- **Large Payloads** – Set an appropriate timeout or split long prompts into multiple requests. + +## Roadmap + +- Multi-cloud support (GCP Vertex AI & Azure ML) +- Streaming chat completions (`stream=True`) +- Built-in auth (API key / OAuth) + + +Remember to delete unused SageMaker endpoints to avoid unexpected charges. + From d0f9882c107a109f0410dca401b98e9d61508a87 Mon Sep 17 00:00:00 2001 From: "docsalot-app[bot]" <207601912+docsalot-app[bot]@users.noreply.github.com> Date: Thu, 25 Sep 2025 18:53:03 +0000 Subject: [PATCH 18/18] docs: create concepts/jumpstart-models.mdx --- concepts/jumpstart-models.mdx | 89 +++++++++++++++++++++++++++++++++++ 1 file changed, 89 insertions(+) create mode 100644 concepts/jumpstart-models.mdx diff --git a/concepts/jumpstart-models.mdx b/concepts/jumpstart-models.mdx new file mode 100644 index 0000000..68367f3 --- /dev/null +++ b/concepts/jumpstart-models.mdx @@ -0,0 +1,89 @@ +--- +title: Searching SageMaker JumpStart Models +description: How to discover and deploy AWS JumpStart models with Magemaker +--- + +## Overview + +AWS SageMaker JumpStart ships hundreds of pre-built, fine-tuned models (text, vision, audio, tabular) that can be deployed in one click. Magemaker exposes these models directly in the CLI so you can: + +1. **Search** by framework (TensorFlow, PyTorch, Hugging Face, etc.) +2. **Preview** model IDs & recommended instance types +3. **Generate** a ready-to-use YAML file for reproducible deployments + + +JumpStart support is **AWS-only** for now. The rest of the Magemaker workflow (YAML deployment, endpoint management, querying) is identical to Hugging Face models. + + +--- + +## Interactive Search Workflow + +```bash +magemaker --cloud aws +``` + +1. Choose **“Deploy a Model”** ➜ **“SageMaker JumpStart”**. +2. Pick a framework (Hugging Face, TensorFlow, PyTorch, etc.). +3. Use arrow keys to browse or start typing to filter. +4. Press **Enter** to select a model and follow the prompts. + +![JumpStart Search](../Images/jumpstart-search.png) + +At the end of the wizard a file like `.magemaker_config/jumpstart-inception-v3.yaml` is created so you can redeploy the exact same model via CI/CD: + +```bash +magemaker --deploy .magemaker_config/jumpstart-inception-v3.yaml +``` + +--- + +## YAML Schema + +JumpStart models use the same `!Model` schema but with `source: sagemaker` and an **AWS-provided model ID**: + +```yaml +models: +- !Model + id: tensorflow-ic-imagenet-inception-v3-classification-4 # ← JumpStart ID + source: sagemaker + +deployment: !Deployment + destination: aws + endpoint_name: inception-v3-demo + instance_type: ml.g4dn.xlarge +``` + + +Need GPUs? JumpStart docs list the minimum compatible instance for each model. If deployment fails with `CapacityError`, try a larger GPU instance or a different region. + + +--- + +## Programmatic Access + +If you are building your own UI on top of Magemaker you can import the utility directly: + +```python +from magemaker.sagemaker.search_jumpstart_models import search_sagemaker_jumpstart_model + +models = search_sagemaker_jumpstart_model() +print(models[:5]) # preview first five IDs +``` + +--- + +## Common Errors & Fixes + +| Error | Cause | Fix | +|-------|-------|-----| +| `Model not found` | Wrong ID copied | Copy the **Model ID** (not the display name) from JumpStart docs. +| `CapacityError` | Instance type unavailable in region | Switch regions or request a quota increase. +| `ClientError: ValidationException` | Instance too small | Use the **recommended** instance class shown in JumpStart UI. + +--- + +## Next Steps + +• [Model Deployment Guide](/concepts/deployment) – deep-dive into YAML options
    +• [Supported Models](/concepts/models) – full compatibility matrix