diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index a70cce7..5a70e59 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -65,4 +65,4 @@ By contributing, you agree that your contributions will be licensed under the Ap ## Questions? -Feel free to contact us at [support@slashml.com](mailto:support@slashml.com) if you have any questions about contributing! \ No newline at end of file +Feel free to contact us at [support@slashml.com](mailto:support@slashml.com) if you have any questions about contributing! diff --git a/about.mdx b/about.mdx index d9c04a4..29ba392 100644 --- a/about.mdx +++ b/about.mdx @@ -6,7 +6,13 @@ description: Deploy open source AI models to AWS, GCP, and Azure in minutes ## About Magemaker -Magemaker is a Python tool that simplifies the process of deploying open source AI models to your preferred cloud provider. Instead of spending hours digging through documentation, Magemaker lets you deploy Hugging Face models directly to AWS SageMaker, Google Cloud Vertex AI, or Azure Machine Learning. +Magemaker is a Python tool that simplifies the process of deploying open-source AI models to your preferred cloud provider. Instead of spending hours digging through documentation, Magemaker lets you deploy Hugging Face models directly to AWS SageMaker, Google Cloud Vertex AI, or Azure Machine Learning—​all from a single CLI. + +In addition to model deployment, Magemaker now offers: + +- **JumpStart model search & deploy** – interactively search the AWS SageMaker JumpStart catalog and deploy with one click. +- **OpenAI-compatible proxy server** – expose any deployed endpoint through an `/v1/chat/completions` API so existing OpenAI clients just work. +- **YAML-first workflows** – version-controlled deployment, querying, and fine-tuning configurations. ## What we're working on next @@ -20,15 +26,13 @@ Do submit your feature requests at https://magemaker.featurebase.app/ ## Known issues - Querying within Magemaker currently only works with text-based models -- Deleting a model is not instant, it may show up briefly after deletion +- Deleting a model is not instant; it may show up briefly after deletion - Deploying the same model within the same minute will break -- Hugging-face models on Azure have different Ids than their Hugging-face counterparts. Follow the steps specified in the quick-start guide to find the relevant models -- For Azure deploying models other than Hugging-face is not supported yet. -- Python3.13 is not supported because of an open-issue by Azure. https://github.com/Azure/azure-sdk-for-python/issues/37600 - - -If there is anything we missed, do point them out at https://magemaker.featurebase.app/ +- Hugging Face models on Azure have different IDs than their Hugging Face counterparts. Follow the steps in the quick-start guide to find the relevant models. +- For Azure, deploying models other than Hugging Face is not supported yet. +- Python 3.13 is not supported because of an open issue with Azure (see https://github.com/Azure/azure-sdk-for-python/issues/37600). +If there is anything we missed, do point it out at https://magemaker.featurebase.app/ ## License @@ -36,7 +40,7 @@ Distributed under the Apache 2.0 License. See `LICENSE` for more information. ## Contact -You can reach us, faizan & jneid, at [faizan|jneid@slashml.com](mailto:support@slashml.com). +You can reach us, Faizan & Jneid, at [support@slashml.com](mailto:support@slashml.com). You can give feedback at https://magemaker.featurebase.app/ diff --git a/concepts/cli-reference.mdx b/concepts/cli-reference.mdx new file mode 100644 index 0000000..2308fe3 --- /dev/null +++ b/concepts/cli-reference.mdx @@ -0,0 +1,38 @@ +--- +title: CLI Reference +--- + +## Global syntax +```bash +magemaker [OPTIONS] +``` + +| Flag | Description | +|------|-------------| +| `--cloud [aws\|gcp\|azure\|all]` | Configure or use a specific cloud provider. **Required on first run**. | +| `--deploy ` | Deploy a model defined in a YAML file. | +| `--train ` | Fine-tune a model using a YAML training config. | +| `--instance ` | Override the default instance type when using quick-deploy flags. | +| `--verbose` | Enable debug-level logging. | +| `--version` | Display Magemaker version and exit. | + +### Deprecated / Removed Flags +- `--query` — superseded by the interactive *Query* menu and Python SDK. + +## Examples +Deploy a BERT model to SageMaker: +```bash +magemaker --deploy configs/bert-uncased.yaml +``` + +Configure all three clouds in one go: +```bash +magemaker --cloud all +``` + +Run with detailed logs: +```bash +magemaker --cloud aws --verbose +``` + +Refer to individual tutorials for provider-specific YAML examples. diff --git a/concepts/contributing.mdx b/concepts/contributing.mdx index 8c61908..d09e8ae 100644 --- a/concepts/contributing.mdx +++ b/concepts/contributing.mdx @@ -3,9 +3,11 @@ title: Contributing description: Guide to contributing to Magemaker --- -## Welcome to Magemaker Contributing Guide +## Welcome to the Magemaker Contributing Guide -We're excited that you're interested in contributing to Magemaker! This document will guide you through the process of contributing to the project. +We're excited that you're interested in contributing to Magemaker! This document explains **code**, **tests**, and **documentation** workflows so you can get productive quickly. + +--- ## Ways to Contribute @@ -17,13 +19,15 @@ We're excited that you're interested in contributing to Magemaker! This document Suggest new features or improvements - Help improve our documentation + Help improve our documentation (yes, even fixing a typo counts!) - Submit pull requests with bug fixes or new features + Submit pull-requests with bug-fixes or new features +--- + ## Development Setup @@ -38,23 +42,25 @@ We're excited that you're interested in contributing to Magemaker! This document pip install -e ".[dev]" ``` - + ```bash - git checkout -b feature/your-feature-name + git checkout -b feat/your-feature-name ``` +--- + ## Development Guidelines -### Code Style +### Code Style & Linting -We use the following tools to maintain code quality: -- Black for Python code formatting -- isort for import sorting -- flake8 for style guide enforcement +We enforce a consistent style-guide via: +- **Black** for auto-formatting +- **isort** for import ordering +- **flake8** for static analysis -Run the following before committing: +Run before committing: ```bash black . isort . @@ -63,106 +69,126 @@ flake8 ### Testing - -All new features should include tests. We use pytest for our test suite. - +All new features **must** include unit tests. We use **pytest** across the repo & CI. -Run tests locally: +Run the full suite locally: ```bash -pytest tests/ +pytest -q ``` +For integration tests requiring cloud credentials you may mark them with +`@pytest.mark.integration` and gate them behind relevant environment variables so +CI can skip them safely. + ### Documentation -When adding new features, please update the relevant documentation: +1. Update/extend code-level docstrings. +2. Add or update `.mdx` files under the `docs/` folder. +3. **Always** run the documentation site locally to verify links & formatting. + +#### Running Docs Locally + +We use [Mintlify](https://mintlify.com) for the docs site. +```bash +npm i -g mintlify +mintlify dev # runs on http://localhost:3000 +``` -1. Update the README.md if needed -2. Add/update docstrings for new functions/classes -3. Create/update relevant .mdx files in the docs directory +If you introduce a brand-new page remember to: +1. Add the file under `docs/` (e.g. `docs/tutorials/your-topic.mdx`). +2. Add the page to `mint.json -> navigation` so it appears in the sidebar. -## Pull Request Process +See the new pages **OpenAI-compatible Proxy**, **JumpStart Search**, and **CLI Reference** for examples. + +--- + +## Pull-Request Process - Create a new branch for your changes: ```bash - git checkout -b feature/your-feature + git checkout -b feat/my-change ``` - - Make your changes and commit them with clear commit messages: + ```bash git add . - git commit -m "feat: add new deployment option" + git commit -m "feat: add " ``` - - Push your changes to your fork: + ```bash - git push origin feature/your-feature + git push origin feat/my-change ``` - - Open a Pull Request against the main repository + + Open a Pull-Request against the **main** branch. Our GitHub Action will + automatically stage preview docs and comment with a live URL. -### Pull Request Guidelines +### PR Checklist - Provide a clear description of your changes + Explain **what** & **why** – screenshots/GIFs encouraged - Include relevant tests for new features + Add unit/integration tests for all new behavior - Update documentation as needed + Update docs + `mint.json` navigation when adding user-facing features - Keep commits focused and clean + Rebase / squash to keep commits focused -## Commit Message Convention +--- -We follow the [Conventional Commits](https://www.conventionalcommits.org/) specification: +## Commit Message Convention -- `feat:` New feature -- `fix:` Bug fix -- `docs:` Documentation changes -- `style:` Code style changes -- `refactor:` Code refactoring -- `test:` Adding missing tests -- `chore:` Maintenance tasks +We follow [Conventional Commits](https://www.conventionalcommits.org/): +- **feat:** new feature +- **fix:** bug fix +- **docs:** documentation only changes +- **style:** formatting, missing semi-colons, etc. +- **refactor:** code change that neither fixes a bug nor adds a feature +- **test:** adding or correcting tests +- **chore:** maintenance tasks Example: ```bash -feat(deployment): add support for custom docker images +feat(proxy): add OpenAI-compatible /chat/completions route ``` -## Getting Help +--- -If you need help with your contribution: +## Getting Help - - Join our Discord server for real-time discussions + + Real-time chat with maintainers & community - - Start a discussion in our GitHub repository + Long-form Q&A and design proposals - - - Contact us at support@slashml.com + + support@slashml.com +--- + ## Code of Conduct -We are committed to providing a welcoming and inclusive experience for everyone. Please read our [Code of Conduct](https://github.com/slashml/magemaker/CODE_OF_CONDUCT.md) before participating. +We are committed to a welcoming, inclusive environment. Please read our +[Code of Conduct](https://github.com/slashml/magemaker/CODE_OF_CONDUCT.md) +before participating. + +--- ## License -By contributing to Magemaker, you agree that your contributions will be licensed under the Apache 2.0 License. \ No newline at end of file +By contributing you agree your work is licensed under the +Apache 2.0 License. diff --git a/concepts/deployment.mdx b/concepts/deployment.mdx index 66ca7a9..02f1159 100644 --- a/concepts/deployment.mdx +++ b/concepts/deployment.mdx @@ -62,7 +62,7 @@ deployment: !Deployment destination: gcp endpoint_name: opt-125m-gcp instance_count: 1 - machine_type: n1-standard-4 + instance_type: n1-standard-4 accelerator_type: NVIDIA_TESLA_T4 accelerator_count: 1 @@ -83,7 +83,7 @@ deployment: !Deployment models: - !Model - id: facebook-opt-125m + id: facebook--opt-125m source: huggingface ``` @@ -202,7 +202,7 @@ Choose your instance type based on your model's requirements: 4. Set up monitoring and alerting for your endpoints -Make sure you setup budget monitory and alerts to avoid unexpected charges. +Make sure you setup budget monitoring and alerts to avoid unexpected charges. @@ -225,4 +225,4 @@ Common issues and their solutions: - Verify model ID and version - Check instance memory requirements - Validate Hugging Face token if required - - Endpoing deployed but deployment failed. Check the logs, and do report this to us if you see this issue. + - Endpoint deployed but deployment failed. Check the logs, and do report this to us if you see this issue. diff --git a/concepts/fine-tuning.mdx b/concepts/fine-tuning.mdx index 88835aa..96c5a26 100644 --- a/concepts/fine-tuning.mdx +++ b/concepts/fine-tuning.mdx @@ -5,7 +5,13 @@ description: Guide to fine-tuning models with Magemaker ## Fine-tuning Overview -Fine-tuning allows you to adapt pre-trained models to your specific use case. Magemaker simplifies this process through YAML configuration. +Fine-tuning allows you to adapt pre-trained models to your specific use case. +Magemaker currently supports **fine-tuning on AWS SageMaker** (GCP/Azure support is on the roadmap). +Everything is driven by a YAML configuration that describes the training job and the model you want to fine-tune. + + +If you don’t specify any hyper-parameters, Magemaker will automatically choose sensible defaults based on the model task (e.g. `text-generation`, `text-classification`). You can always override any of these values explicitly in your YAML file. + ### Basic Command @@ -13,20 +19,27 @@ Fine-tuning allows you to adapt pre-trained models to your specific use case. Ma magemaker --train .magemaker_config/train-config.yaml ``` +Running the command will: +1. Upload your training data to S3 (if it is on a local path) +2. Spin up a SageMaker Training Job with the instance type you request +3. Stream training logs to your terminal +4. Output the S3 URI of the fine-tuned model artefact when training finishes + +--- ## Configuration ### Basic Training Configuration ```yaml training: !Training - destination: aws - instance_type: ml.p3.2xlarge + destination: aws # Only aws is supported for now + instance_type: ml.p3.2xlarge # GPU instances are recommended instance_count: 1 training_input_path: s3://your-bucket/training-data.csv models: - !Model - id: your-model-id + id: google-bert/bert-base-uncased source: huggingface ``` @@ -35,7 +48,7 @@ models: ```yaml training: !Training destination: aws - instance_type: ml.p3.2xlarge + instance_type: ml.p3.8xlarge # 4×V100 GPUs instance_count: 1 training_input_path: s3://your-bucket/data.csv hyperparameters: !Hyperparameters @@ -49,56 +62,74 @@ training: !Training save_steps: 1000 ``` +### Optional & Auto-generated Fields + +| Field | Description | Default | +|-------|-------------|---------| +| `hyperparameters` | Fine-tuning hyper-params | Auto-generated based on model task | +| `output_path` | S3 URI to store model artefacts | `s3:///magemaker/` | +| `metric_definitions` | Regexes for CloudWatch metrics | Provided automatically | + +--- ## Data Preparation ### Supported Formats - - Simple tabular data - - Easy to prepare - - Good for classification tasks + • Simple tabular data + • Easy to prepare + • Good for classification tasks - - Flexible data format - - Good for complex inputs - - Supports nested structures + • Flexible data format + • Good for generative tasks + • Supports nested structures -### Data Upload +### Uploading Your Data - Format your data according to model requirements + Ensure the data columns match the input format expected by your model (e.g. `text`, `label`). - Use AWS CLI or console to upload data + ```bash + aws s3 cp ./data/train.csv s3://your-bucket/train.csv + ``` + Magemaker will upload local files automatically, but pre-uploading can speed up jobs. - - Specify S3 path in training configuration + + Set `training_input_path` to the S3 URI in your YAML file. +--- ## Instance Selection -### Training Instance Types - -Choose based on: -- Dataset size -- Model size -- Training time requirements -- Cost constraints +Choose instance types based on: +• Dataset size +• Model size +• Time-to-train vs. cost trade-off Popular choices: -- ml.p3.2xlarge (1 GPU) -- ml.p3.8xlarge (4 GPUs) -- ml.p3.16xlarge (8 GPUs) +- **ml.p3.2xlarge** (1 × V100 GPU) +- **ml.p3.8xlarge** (4 × V100 GPUs) +- **ml.p3.16xlarge** (8 × V100 GPUs) -## Hyperparameter Tuning + +Remember to delete training jobs and models you no longer need to avoid ongoing storage charges in S3. + -### Basic Parameters +--- +## Hyper-parameter Tuning + +You can provide fixed values **or** ranges/sets for automated tuning (coming soon). +Below are two common patterns: + +### Fixed Parameters ```yaml hyperparameters: !Hyperparameters @@ -107,24 +138,39 @@ hyperparameters: !Hyperparameters batch_size: 32 ``` -### Advanced Tuning +### Ranges / Sets (Experimental) ```yaml hyperparameters: !Hyperparameters - epochs: 3 - learning_rate: + learning_rate: min: 1e-5 - max: 1e-4 + max: 5e-5 scaling: log batch_size: values: [16, 32, 64] ``` +--- ## Monitoring Training -### CloudWatch Metrics +During training Magemaker streams the CloudWatch logs to your terminal. Key metrics include: + +- `loss` (training / evaluation) +- `learning_rate` +- `eval_accuracy` / `eval_f1` (task dependent) +- GPU utilisation (viewable in the SageMaker console) -Available metrics: -- Loss -- Learning rate -- GPU utilization \ No newline at end of file +You can also open the SageMaker Training Job in the AWS Console for a full dashboard view. + +--- +## Next Steps + +1. Deploy the fine-tuned model using the artefact S3 URI: + ```yaml + models: + - !Model + id: s3:///model.tar.gz + source: sagemaker + ``` +2. Enable automatic hyper-parameter tuning (multi-job search) – feature coming soon. +3. Share feedback or request features at [magemaker.featurebase.app](https://magemaker.featurebase.app). diff --git a/concepts/models.mdx b/concepts/models.mdx index 0161380..4d2d9c1 100644 --- a/concepts/models.mdx +++ b/concepts/models.mdx @@ -3,203 +3,159 @@ title: Models description: Guide to supported models and their requirements --- -## Supported Models +## Supported Model Sources - -Currently, Magemaker supports deployment of Hugging Face models only. Support for cloud provider marketplace models is coming soon! - - -### Hugging Face Models +Magemaker can deploy models from more than one source. Specify the source in the `!Model` block with the `source` field. - - - LLaMA - - BERT - - GPT-2 - - T5 + + - Text-generation, feature-extraction, etc.
+ - Works across **AWS**, **GCP**, and **Azure** (IDs differ on Azure!)
- - - - Sentence Transformers - - CLIP - - DPR + + - Foundation models curated by AWS
+ - Deployable **only on SageMaker** for now
+ - Includes proprietary & open-source models such as Falcon-40B-Instruct, Mistral-7B, etc.
+ +Add more sources by extending the `source` field (e.g. `sagemaker`, `huggingface`). Support for GCP Model Garden and Azure OpenAI is on the roadmap. + + +### Example `!Model` blocks + +```yaml +# Hugging Face model +- !Model + id: meta-llama/Meta-Llama-3-8B-Instruct + source: huggingface +``` + +```yaml +# AWS JumpStart model (ID is the JumpStart model ID) +- !Model + id: huggingface-llama-2-7b-f + source: sagemaker # JumpStart models use the "sagemaker" source +``` + ### Future Support -We plan to add support for the following model sources: +We plan to add support for these additional sources: - - Models from AWS Marketplace and SageMaker built-in algorithms - - - - Models from Vertex AI Model Garden and Foundation Models - - - - Models from Azure ML Model Catalog and Azure OpenAI - + Vertex AI foundation models + Azure-hosted proprietary models -## Model Requirements -### Instance Type Recommendations by Cloud Provider +--- + +## Instance-Type Recommendations (by Cloud) #### AWS SageMaker -1. **Small Models** (ml.m5.xlarge) - ```yaml - instance_type: ml.m5.xlarge - ``` -2. **Medium Models** (ml.g4dn.xlarge) - ```yaml - instance_type: ml.g4dn.xlarge - ``` -3. **Large Models** (ml.g5.12xlarge) - ```yaml - instance_type: ml.g5.12xlarge - num_gpus: 4 - ``` +1. **Small** – `ml.m5.xlarge` +2. **Medium** – `ml.g4dn.xlarge` +3. **Large** – `ml.g5.12xlarge` (4× A10G GPU) #### GCP Vertex AI -1. **Small Models** (n1-standard-4) - ```yaml - machine_type: n1-standard-4 - ``` -2. **Medium Models** (n1-standard-8 + GPU) - ```yaml - machine_type: n1-standard-8 - accelerator_type: NVIDIA_TESLA_T4 - accelerator_count: 1 - ``` -3. **Large Models** (a2-highgpu-1g) - ```yaml - machine_type: a2-highgpu-1g - ``` +1. **Small** – `n1-standard-4` +2. **Medium** – `n1-standard-8` + `NVIDIA_TESLA_T4` +3. **Large** – `a2-highgpu-1g` #### Azure ML -1. **Small Models** (Standard_DS3_v2) - ```yaml - instance_type: Standard_DS3_v2 - ``` -2. **Medium Models** (Standard_NC6s_v3) - ```yaml - instance_type: Standard_NC6s_v3 - ``` -3. **Large Models** (Standard_ND40rs_v2) - ```yaml - instance_type: Standard_ND40rs_v2 - ``` +1. **Small** – `Standard_DS3_v2` +2. **Medium** – `Standard_NC6s_v3` +3. **Large** – `Standard_ND40rs_v2` -## Example Deployments - -### Example Hugging Face Model Deployment +--- -Deploy the same Hugging Face model to different cloud providers: +## Example Deployments -AWS SageMaker: ```yaml +# SageMaker – Hugging Face model models: - !Model id: facebook/opt-125m source: huggingface + +deployment: !Deployment + destination: aws +``` + +```yaml +# SageMaker – JumpStart model +models: +- !Model + id: huggingface-llama-2-7b-f # JumpStart model ID + source: sagemaker + deployment: !Deployment destination: aws ``` -GCP Vertex AI: ```yaml +# Vertex AI – Hugging Face model models: - !Model id: facebook/opt-125m source: huggingface + deployment: !Deployment destination: gcp ``` -Azure ML: ```yaml +# Azure ML – Hugging Face model (note different ID format) models: - !Model id: facebook-opt-125m source: huggingface + deployment: !Deployment destination: azure ``` - The model ids for Azure are different from AWS and GCP. Make sure to use the one provided by Azure in the Azure Model Catalog. - - To find the relevnt model id, follow the following steps - - - Find the workpsace in the Azure portal and click on the studio url provided. Click on the `Model Catalog` on the left side bar - ![Azure ML Creation](../Images/workspace-studio.png) - - - - Select Hugging-Face from the collections list. The id of the model card is the id you need to use in the yaml file - ![Azure ML Creation](../Images/hugging-face.png) - - - +Azure uses its own model-catalog IDs. Follow the steps in the **Quick-Start** guide to locate the correct ID. +--- -## Model Configuration +## Model Configuration Reference -### Basic Parameters +### Basic ```yaml -models: - !Model - id: your-model-id - source: huggingface|sagemaker # we don't support vertex and azure specific models yet - revision: latest # Optional: specify model version + id: + source: huggingface | sagemaker ``` -### Advanced Parameters +### Advanced ```yaml -models: - !Model - id: your-model-id + id: source: huggingface - predict: + predict: # Optional runtime parameters temperature: 0.7 top_p: 0.9 top_k: 50 max_new_tokens: 500 - do_sample: true ``` +--- + ## Best Practices -1. **Model Selection** - - Compare pricing across cloud providers - - Consider data residency requirements - - Test latency from different regions +1. **Model Selection** – Evaluate cost & latency per cloud. +2. **Cost Management** – Tear down unused endpoints and set up budget alerts. +3. **Performance Tuning** – Adjust instance/GPU sizes; monitor logs & metrics. -3. **Cost Management** - - Compare instance pricing - - Make sure you set up the relevant alerting +--- ## Troubleshooting -Common model-related issues: - -1. **Cloud-Specific Issues** - - Check quota limits - - Verify regional availability - - Review cloud-specific logs - -2. **Performance Issues** - - Compare cross-cloud latencies - - Check network connectivity - - Monitor resource utilization - -3. **Authentication Issues** - - Verify cloud credentials - - Check model access permissions - - Validate API keys \ No newline at end of file +1. **Quota Errors** – Request increases for GPU/instance quotas. +2. **Auth Errors** – Verify cloud credentials & HF tokens. +3. **Slow Start-up** – Large models may take >15 min; watch endpoint logs for progress. diff --git a/configuration/AWS.mdx b/configuration/AWS.mdx index cdc4b9f..3b91769 100644 --- a/configuration/AWS.mdx +++ b/configuration/AWS.mdx @@ -4,15 +4,15 @@ title: AWS ### AWS CLI -To install Azure SDK on MacOS, you need to have the latest OS and you need to use Rosetta terminal. Also, make sure you have the latest version of Xcode tools installed. +To install the AWS CLI on macOS ≥ 13.6.6 running on Apple Silicon you *must* use a Rosetta terminal. Verify by running `arch` – it should print `i386` for a Rosetta session. Make sure you have the latest version of Xcode Command-Line Tools installed. -Follow this guide to install the latest AWS CLI +Follow the official guide to install the latest AWS CLI: https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html +Once the CLI is installed and working, continue with the steps below. -Once you have the CLI installed and working, follow these steps - +--- ### AWS Account @@ -22,61 +22,104 @@ Register for an [AWS account](https://aws.amazon.com/) and sign-in to the [conso -From the console, use the Search bar to find and select IAM (***do not use IAM Identity Center***, which is confusingly similar but a totally different system). +From the console, use the search bar to find and select **IAM** (*do **not** use IAM Identity Center, which is different*). -![Enter image alt description](../Images/muJ_Image_1.png) +![IAM Search](../Images/muJ_Image_1.png) -You should see the following screen after clicking IAM. +After clicking **IAM** you should see a screen similar to this: -![Enter image alt description](../Images/ldC_Image_2.png) +![IAM Home](../Images/ldC_Image_2.png) -1. Select `Users` in the side panel - -![Enter image alt description](../Images/QX4_Image_3.png) +1. Select **Users** in the side-panel + ![Users](../Images/QX4_Image_3.png) -2. Create a user if you don't already have one - -![Enter image alt description](../Images/ly3_Image_4.png) +2. Create a user if you don't already have one + ![Create User](../Images/ly3_Image_4.png) -1. Click on "Add permissions" - -![Enter image alt description](../Images/E7x_Image_5.png) +1. Click **Add permissions** + ![Add Permissions](../Images/E7x_Image_5.png) -2. Select "Attach policies directly". Under permission policies, search for and tick the boxes for: - - `AmazonSagemakerFullAccess` +2. Choose **Attach policies directly** and tick the following policies: + - `AmazonSageMakerFullAccess` - `IAMFullAccess` - `ServiceQuotasFullAccess` -Then click Next. + Then click **Next**. -![Enter image alt description](../Images/01X_Image_6.png) +![Policies](../Images/01X_Image_6.png) -The final list should look like the following: +The selected policies should look like this: -![Enter image alt description](../Images/Dfp_Image_7.png) +![Policies List](../Images/Dfp_Image_7.png) -Click "Create user" on the following screen. +Click **Create user** to finish. -1. Click the name of the user you've just created (or one that already exists) -2. Go to "Security Credentials" tab -3. Scroll down to "Access Keys" section -4. Click "Create access key" -5. Select Command Line Interface then click next +1. Click the name of the user you just created (or an existing user). +2. Go to the **Security credentials** tab. +3. Scroll to the **Access keys** section and click **Create access key**. +4. Select **Command Line Interface** and click **Next**. + +![Create Access Key](../Images/BPP_Image_8.png) -![Enter image alt description](../Images/BPP_Image_8.png) +Add an optional description, then **Next**. -Enter a description (this is optional, can leave blank). Then click next. +![Key Description](../Images/gMD_Image_9.png) -![Enter image alt description](../Images/gMD_Image_9.png) +**Save BOTH the Access Key ID and the Secret Access Key** – you will not be able to see the secret again after closing the dialog. -**Store BOTH the Access Key and the Secret access key for the next step. Once you've saved both keys, click Done.** +![Save Keys](../Images/Gjw_Image_10.png) + -![Enter image alt description](../Images/Gjw_Image_10.png) + +Magemaker requires an execution role so that SageMaker can access Amazon S3 and other services on your behalf. You have two options: + +**Option A – Automatic (recommended)** +```bash +# Clone the repo or download the script first +bash magemaker/scripts/setup_role.sh +``` +The script creates a role called `MagemakerExecutionRole` with the necessary trust relationship and policies (`AmazonSageMakerFullAccess`, `AmazonS3FullAccess`). Copy the generated **Role ARN** printed by the script. + +**Option B – Manual** +1. In **IAM → Roles** click **Create role**. +2. Select **SageMaker** as the trusted entity type. +3. Attach the policies: + - `AmazonSageMakerFullAccess` + - `AmazonS3FullAccess` +4. Name the role (e.g. `MagemakerExecutionRole`) and create it. +5. Copy the role's **ARN** for the next step. - \ No newline at end of file + + +Magemaker stores credentials in a local `.env` file. Create (or update) `.env` in your project root: + +```bash +AWS_ACCESS_KEY_ID="" +AWS_SECRET_ACCESS_KEY="" +AWS_DEFAULT_REGION="us-east-1" # or your preferred region +SAGEMAKER_ROLE="arn:aws:iam:::role/MagemakerExecutionRole" + +# Optional – required for gated Hugging Face models like Llama 3 +HUGGING_FACE_HUB_KEY="" +``` + +Never commit your `.env` file to version control! + + + +--- + +## Next Step – Verify Configuration +Run the following command to verify everything works: + +```bash +magemaker --cloud aws +``` + +The first run will test your credentials, create any missing resources, and confirm the role. After that, you are ready to deploy models. diff --git a/configuration/Environment.mdx b/configuration/Environment.mdx index 0781ec3..4c6bf1f 100644 --- a/configuration/Environment.mdx +++ b/configuration/Environment.mdx @@ -3,28 +3,80 @@ title: Environment Variables --- ### Required Config File -A `.env` file is automatically created when you run `magemaker --cloud `. This file contains the necessary environment variables for your cloud provider(s). +A `.env` file is automatically created the first time you run one of the following commands: -By default, Magemaker will look for a `.env` file in your project root with the following variables based on which cloud provider(s) you plan to use: +```bash +magemaker --cloud +# or +python server.py # launches the OpenAI-compatible proxy server +``` + +This file stores the credentials and run-time settings Magemaker needs for every component—including the optional FastAPI **OpenAI-compatible proxy server** introduced in v0.4. + +Below is the complete list of recognised environment variables, grouped by feature. Only the variables that correspond to the cloud provider(s) and features you actually use are required. ```bash -# AWS Configuration -AWS_ACCESS_KEY_ID="your-access-key" # Required for AWS -AWS_SECRET_ACCESS_KEY="your-secret-key" # Required for AWS -SAGEMAKER_ROLE="arn:aws:iam::..." # Required for AWS +################################################# +# Core Magemaker Settings +################################################# +# Change the default folder where YAML configs are written/read +CONFIG_DIR=".magemaker_config" # Optional – defaults to .magemaker_config + +################################################# +# AWS Configuration (SageMaker, JumpStart, Proxy) +################################################# +AWS_ACCESS_KEY_ID="" # Required for AWS +AWS_SECRET_ACCESS_KEY="" # Required for AWS +SAGEMAKER_ROLE="arn:aws:iam::..." # Required for AWS deployments & training +AWS_REGION_NAME="us-east-1" # Optional – auto-detected if omitted -# GCP Configuration -PROJECT_ID="your-project-id" # Required for GCP -GCLOUD_REGION="us-central1" # Required for GCP +################################################# +# GCP Configuration (Vertex AI) +################################################# +PROJECT_ID="" # Required for GCP +GCLOUD_REGION="us-central1" # Required for GCP -# Azure Configuration -AZURE_SUBSCRIPTION_ID="your-sub-id" # Required for Azure +################################################# +# Azure Configuration (Azure ML) +################################################# +AZURE_SUBSCRIPTION_ID="" # Required for Azure AZURE_RESOURCE_GROUP="ml-resources" # Required for Azure AZURE_WORKSPACE_NAME="ml-workspace" # Required for Azure -AZURE_REGION="eastus" # Required for Azure +AZURE_REGION="eastus" # Required for Azure + +################################################# +# Hugging Face +################################################# +HUGGING_FACE_HUB_KEY="" # Required for gated models (e.g. Llama-3) -# Optional configurations -HUGGING_FACE_HUB_KEY="your-hf-token" # Required for gated HF models like llama +################################################# +# FastAPI OpenAI-Compatible Proxy (server.py) +################################################# +# These are OPTIONAL and only used when you run `python server.py` +# ➜ http://localhost:8000/chat/completions (OpenAI format) + +# Network +SERVER_HOST="0.0.0.0" # Optional – default 0.0.0.0 +SERVER_PORT="8000" # Optional – default 8000 + +# Rate-limiting / auth +OPENAI_API_KEY="" # Optional – litellm requires a key value even for local SageMaker models. Any non-empty string works. + +# Advanced litellm / logging flags can also be set (see litellm docs) ``` -Never commit your .env file to version control! + +Never commit your `.env` file to version control! + + +#### How Variable Resolution Works +1. Magemaker reads **system environment variables** first. +2. It then merges values from `.env` (if present) using [python-dotenv](https://github.com/theskumar/python-dotenv). +3. Finally, Magemaker may override certain variables at run-time. For example, `AWS_REGION_NAME` is automatically set to the region configured in your AWS session if you don’t specify it. + +If you change any values in `.env`, restart your shell (or run `source .env`) before invoking Magemaker again. + +#### Troubleshooting Tips +- **Missing credentials** – Run `magemaker --cloud ` again; this will prompt you for any values that are still undefined and append them to `.env`. +- **Wrong region** – Set `AWS_REGION_NAME` or `GCLOUD_REGION` explicitly and delete any cached sessions in `~/.aws/` or `~/.config/gcloud/`. +- **Proxy server won’t start** – Check that the chosen `SERVER_PORT` isn’t already in use and that `OPENAI_API_KEY` is not empty. diff --git a/getting_started.md b/getting_started.md index 0bc86fa..792817f 100644 --- a/getting_started.md +++ b/getting_started.md @@ -6,7 +6,7 @@ Deploy from an interactive menu in the terminal or from a simple YAML file. Instead of spending hours digging through documentation to figure out how to get AWS working, Magemaker lets you deploy Hugging Face models directly to AWS SageMaker, Google Cloud Vertex AI, or Azure Machine Learning, from the command line or a simple YAML file. -Choose a model from Hugging Face, and Magemaker will spin up an instance with a ready-to-query endpoint of the model in minutes. +Choose a model from Hugging Face (or search AWS JumpStart directly in-tool), and Magemaker will spin up an instance with a ready-to-query endpoint of the model in minutes. ## Getting Started @@ -16,19 +16,19 @@ To get a local copy up and running follow these simple steps. ### Prerequisites -* Python 3.11 (3.13 is not supported because of azure) +* Python 3.11+ (3.12 is not supported; 3.13 is not yet supported due to an open Azure SDK issue) * Cloud Configuration * An account to your preferred cloud provider, AWS, GCP and Azure. * Each cloud requires slightly different accesses, Magemaker will guide you through getting the necessary credentials to the selected cloud provider * Here's a guide on how to configure AWS and get the credentials [Google Doc](https://docs.google.com/document/d/1NvA6uZmppsYzaOdkcgNTRl7Nb4LbpP9Koc4H_t5xNSg/edit?tab=t.0#heading=h.farbxuv3zrzm) * Quota approval for instances you require for the AI model - * By default, you get some free instances, example with AWS you are pre-approved for 2 ml.m5.xlarge instances with 16gb of RAM each + * By default, you get some free instances, example with AWS you are pre-approved for 2 ml.m5.xlarge instances with 16 GB of RAM each * An installation and configuration of your selected cloud CLI tool(s) * Magemaker will prompt you to install the CLI of the selected cloud provider, if not installed already. - * Magemaker will prompt you to add the necesssary credentials. + * Magemaker will prompt you to add the necessary credentials. -* Certain Hugging Face models (e.g. Llama2) require an access token ([hf docs](https://huggingface.co/docs/hub/en/models-gated#access-gated-models-as-a-user)) +* Certain Hugging Face models (e.g. Llama 2 / Llama 3) require an access token ([hf docs](https://huggingface.co/docs/hub/en/models-gated#access-gated-models-as-a-user)) ## Installation @@ -45,9 +45,9 @@ To get a local copy up and running follow these simple steps. magemaker --cloud [aws|gcp|azure|all] ``` - If this is your first time running this command, It will configure the selected cloud so you’re ready to start deploying models. + If this is your first time running this command, it will configure the selected cloud so you’re ready to start deploying models. - In the case of AWS, it’ll prompt you to enter your Access Key and Secret. You can also specify your AWS region. The default is us-east-1. You only need to change this if your SageMaker instance quota is in a different region. + In the case of AWS, it’ll prompt you to enter your Access Key and Secret. You can also specify your AWS region. The default is `us-east-1`. You only need to change this if your SageMaker instance quota is in a different region. Once configured, it will create a `.env` file and save the credentials there. You can also add your Hugging Face Hub Token to this file if you have one. @@ -65,7 +65,7 @@ To get a local copy up and running follow these simple steps. Run `magemaker --cloud [gcp|azure|aws|all]` to access an interactive menu where you can: * Choose your cloud provider -* Select from available models +* Search and deploy Hugging Face or **AWS JumpStart** models * Configure deployment settings * Monitor deployment progress @@ -76,7 +76,7 @@ For reproducible deployments, use YAML configuration: magemaker --deploy .magemaker_config/bert-base-uncased.yaml ``` -Following is a sample yaml file for deploying a model the same google bert model mentioned above to AWS: +Following is a sample yaml file for deploying a model (the same Google-BERT model mentioned above) to AWS: ```yaml deployment: !Deployment @@ -92,7 +92,7 @@ models: source: huggingface ``` -Following is a yaml file for deploying a facebook model to GCP Vertex AI: +Following is a yaml file for deploying a Facebook model to GCP Vertex AI: ```yaml deployment: !Deployment destination: gcp @@ -134,15 +134,15 @@ models: You can also fine-tune a model using a yaml file, by using the `train` option in the command and passing path to the yaml file -` +``` magemaker --train .magemaker_config/train-bert.yaml -` +``` -Here is an example yaml file for fine-tuning a hugging-face model: +Here is an example yaml file for fine-tuning a Hugging Face model: ```yaml training: !Training - destination: aws # or gcp, azure + destination: aws # or gcp, azure (currently AWS-only) instance_type: ml.p3.2xlarge # varies by cloud provider instance_count: 1 training_input_path: s3://your-bucket/data.csv @@ -163,7 +163,7 @@ If you’re using the `ml.m5.xlarge` instance type, here are some small Hugging **Model: [google-bert/bert-base-uncased](https://huggingface.co/google-bert/bert-base-uncased)** -- **Type:** Fill Mask: tries to complete your sentence like Madlibs +- **Type:** Fill Mask – tries to complete your sentence like Mad Libs - **Query format:** text string with `[MASK]` somewhere in it that you wish for the transformer to fill -
@@ -171,14 +171,13 @@ If you’re using the `ml.m5.xlarge` instance type, here are some small Hugging **Model: [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2)** -- **Type:** Feature extraction: turns text into a 384d vector embedding for semantic search / clustering +- **Type:** Feature extraction – turns text into a 384-dim embedding for semantic search / clustering - **Query format:** "*type out a sentence like this one.*"

- ## Deactivating Models -Any model endpoints you spin up will run continuously unless you deactivate them! Make sure to delete endpoints you’re no longer using so you don’t keep getting charged for your SageMaker instance. +Any model endpoints you spin up will run continuously unless you deactivate them! Make sure to delete endpoints you’re no longer using so you don’t keep getting charged for your SageMaker (or other cloud) instances. diff --git a/installation.mdx b/installation.mdx index 1d843eb..32cc568 100644 --- a/installation.mdx +++ b/installation.mdx @@ -5,10 +5,9 @@ description: Configure Magemaker for your cloud provider - For Macs, maxOS >= 13.6.6 is required. Apply Silicon devices (M1) must use Rosetta terminal. You can verify, your terminals architecture by running `arch`. It should print `i386` for Rosetta terminal. + For Macs, macOS ≥ 13.6.6 is required. Apple-Silicon devices (M-series) must run Magemaker from a Rosetta terminal. You can verify your terminal’s architecture with `arch`; it should print `i386` when running under Rosetta. - Install via pip: ```sh @@ -29,7 +28,7 @@ Once you have your AWS credentials, you can configure Magemaker by running: magemaker --cloud aws ``` -It will prompt you for aws credentials and set up the necessary configurations. +It will prompt you for AWS credentials and set up the necessary configurations. ### GCP (Vertex AI) Configuration @@ -38,7 +37,7 @@ It will prompt you for aws credentials and set up the necessary configurations. [GCP Setup Guide](/configuration/GCP) -once you have your GCP credentials, you can configure Magemaker by running: +Once you have your GCP credentials, you can configure Magemaker by running: ```bash magemaker --cloud gcp @@ -47,7 +46,7 @@ magemaker --cloud gcp ### Azure Configuration - Follow this detailed guide for setting up Azure credentials: - [GCP Setup Guide](/configuration/Azure) + [Azure Setup Guide](/configuration/Azure) Once you have your Azure credentials, you can configure Magemaker by running: @@ -67,33 +66,41 @@ magemaker --cloud all ### Required Config File -By default, Magemaker will look for a `.env` file in your project root with the following variables based on which cloud provider(s) you plan to use: - +A `.env` file is automatically created when you first run `magemaker --cloud `. Magemaker will look for this file in your project root and consume the following variables (by provider): ```bash -# AWS Configuration +# ------------------ AWS ------------------ AWS_ACCESS_KEY_ID="your-access-key" # Required for AWS -AWS_SECRET_ACCESS_KEY="your-secret-key" # Required for AWS +AWS_SECRET_ACCESS_KEY="your-secret-key" # Required for AWS SAGEMAKER_ROLE="arn:aws:iam::..." # Required for AWS +AWS_REGION_NAME="us-east-1" # Optional: override the default region -# GCP Configuration +# ------------------ GCP ------------------ PROJECT_ID="your-project-id" # Required for GCP GCLOUD_REGION="us-central1" # Required for GCP -# Azure Configuration +# ----------------- Azure ----------------- AZURE_SUBSCRIPTION_ID="your-sub-id" # Required for Azure AZURE_RESOURCE_GROUP="ml-resources" # Required for Azure AZURE_WORKSPACE_NAME="ml-workspace" # Required for Azure AZURE_REGION="eastus" # Required for Azure -# Optional configurations -HUGGING_FACE_HUB_KEY="your-hf-token" # Required for gated HF models like llama +# -------------- Hugging Face ------------- +HUGGING_FACE_HUB_KEY="your-hf-token" # Required for gated HF models like Llama + +# -------------- Magemaker ---------------- +CONFIG_DIR=".magemaker_config" # Optional: custom path for YAML configs + +# ---------- OpenAI-Compatible Proxy ------ +SERVER_HOST="0.0.0.0" # Optional: FastAPI host (default 0.0.0.0) +SERVER_PORT="8000" # Optional: FastAPI port (default 8000) +OPENAI_API_KEY="sk-..." # Required if you plan to use the proxy ``` -Never commit your .env file to version control! +Never commit your `.env` file to version control! - For gated models like llama-3.1 from Meta, you might have to accept terms of use for model on hugging face and adding Hugging face token to the environment are necessary for deployment to go through. + For gated models like Meta Llama 3, you must first accept the model’s terms of use on Hugging Face and set `HUGGING_FACE_HUB_KEY` in your environment for deployment to succeed. {/* ## Verification @@ -119,12 +126,11 @@ magemaker verify 3. **Security** - - Follow principle of least privilege + - Follow the principle of least privilege - Use service accounts where possible - Enable audit logging - ## Troubleshooting Common configuration issues: @@ -142,7 +148,7 @@ Common configuration issues: - Confirm project ID 3. **Azure Issues** - - Check resource provider registration status: + - Check resource-provider registration status: ```bash az provider show -n Microsoft.MachineLearningServices az provider show -n Microsoft.ContainerRegistry diff --git a/mint.json b/mint.json index ccb1843..407128b 100644 --- a/mint.json +++ b/mint.json @@ -38,16 +38,23 @@ "mode": "auto" }, "navigation": [ - { + { "group": "Getting Started", - "pages": ["about", "installation", "quick-start"] + "pages": [ + "about", + "installation", + "quick-start" + ] }, { "group": "Tutorials", "pages": [ "tutorials/deploying-llama-3-to-aws", + "tutorials/deploying-llama-3-to-aws-using-query-flag", "tutorials/deploying-llama-3-to-gcp", - "tutorials/deploying-llama-3-to-azure" + "tutorials/deploying-llama-3-to-azure", + "tutorials/searching-jumpstart-models", + "tutorials/openai-compatible-proxy" ] }, { @@ -66,6 +73,12 @@ "concepts/models", "concepts/contributing" ] + }, + { + "group": "Reference", + "pages": [ + "reference/cli" + ] } ], "footerSocials": { @@ -77,15 +90,27 @@ { "title": "Documentation", "links": [ - { "label": "Getting Started", "url": "/" }, - { "label": "Contributing", "url": "/contributing" } + { + "label": "Getting Started", + "url": "/" + }, + { + "label": "Contributing", + "url": "/contributing" + } ] }, { "title": "Resources", "links": [ - { "label": "GitHub", "url": "https://github.com/slashml/magemaker" }, - { "label": "Support", "url": "mailto:support@slashml.com" } + { + "label": "GitHub", + "url": "https://github.com/slashml/magemaker" + }, + { + "label": "Support", + "url": "mailto:support@slashml.com" + } ] } ] diff --git a/quick-start.mdx b/quick-start.mdx index 5853ef8..7ca5a77 100644 --- a/quick-start.mdx +++ b/quick-start.mdx @@ -24,20 +24,26 @@ Supported providers: ### List Models -From the dropdown, select `Show Acitve Models` to see the list of endpoints deployed. +From the dropdown, select `Show Active Models` to see the list of endpoints deployed. -![Acitve Endpoints](../Images/active-1.png) +![Active Endpoints](../Images/active-1.png) + +### Search JumpStart Models (AWS only) + +A new interactive option, `Search JumpStart Models`, lets you browse and deploy models available in AWS SageMaker JumpStart without leaving the CLI. Type a search query (for example, *llama* or *bert*) and follow the prompts to deploy the selected model. + +![JumpStart Search](../Images/jumpstart-search.png) ### Delete Models -From the dropdown, select `Delete a Model Endpoint` to see the list of models endpoints. Press space to select the endpoints you want to delete +From the dropdown, select `Delete a Model Endpoint` to see the list of model endpoints. Press Space to select the endpoints you want to delete. ![Delete Endpoints](../Images/delete-1.png) ### Querying Models -From the dropdown, select `Query a Model Endpoint` to see the list of models endpoints. Press space to select the endpoints you want to query. Enter the query in the text box and press enter to get the response. +From the dropdown, select `Query a Model Endpoint` to see the list of model endpoints. Press Space to select the endpoint you want to query. Enter the query in the text box and press Enter to get the response. ![Query Endpoints](../Images/query-1.png) @@ -110,25 +116,24 @@ models: version: null ``` - The model ids for Azure are different from AWS and GCP. Make sure to use the one provided by Azure in the Azure Model Catalog. + The model IDs for Azure are different from AWS and GCP. Make sure to use the one provided by Azure in the Azure Model Catalog. - To find the relevant model id, follow the following steps + To find the relevant model ID, follow the steps below: - Find the workpsace in the Azure portal and click on the studio url provided. Click on the `Model Catalog` on the left side bar - ![Azure ML Creation](../Images/workspace-studio.png) + Find the workspace in the Azure portal and click on the Studio URL provided. Click on the `Model Catalog` on the left sidebar. + ![Azure ML Creation](../Images/workspace-studio.png) - - Select Hugging-Face from the collections list. The id of the model card is the id you need to use in the yaml file - ![Azure ML Creation](../Images/hugging-face.png) + + Select **Hugging Face** from the collections list. The ID shown on the model card is the ID you need to use in the YAML file. + ![Azure ML Creation](../Images/hugging-face.png) - ### Model Fine-tuning Fine-tune models using the `train` command: @@ -150,26 +155,6 @@ training: !Training per_device_train_batch_size: 32 learning_rate: 2e-5 ``` -{/* -### Recommended Models - - - - Fill Mask: tries to complete your sentence like Madlibs. Query format: text - string with [MASK] somewhere in it. - - - - Feature extraction: turns text into a 384d vector embedding for semantic - search / clustering. Query format: "type out a sentence like this one." - - */} Remember to deactivate unused endpoints to avoid unnecessary charges! @@ -178,7 +163,7 @@ training: !Training ## Contact -You can reach us, faizan & jneid, at [support@slashml.com](mailto:support@slashml.com). +You can reach us, Faizan & Jneid, at [support@slashml.com](mailto:support@slashml.com). If anything doesn't make sense or you have suggestions, do point them out at [magemaker.featurebase.app](https://magemaker.featurebase.app/). diff --git a/reference/cli-flags.mdx b/reference/cli-flags.mdx new file mode 100644 index 0000000..b4c0894 --- /dev/null +++ b/reference/cli-flags.mdx @@ -0,0 +1,33 @@ +--- +title: Magemaker CLI Reference +--- + +## Overview +Run `magemaker --help` to see the built-in help. This page collects all public flags in one place and notes any deprecations. + +| Flag | Description | Example | +|------|-------------|---------| +| `--cloud` | Configure or open the interactive menu for a cloud. Accepts `aws`, `gcp`, `azure`, `all`. | `magemaker --cloud aws` | +| `--deploy ` | Deploy using a YAML config. | `magemaker --deploy .magemaker_config/bert.yaml` | +| `--train ` | Fine-tune using a YAML training config (AWS only). | `magemaker --train .magemaker_config/train-bert.yaml` | +| `--version` | Print Magemaker version. | `magemaker --version` | + +### Deprecated Flags +| Flag | Replacement | +|------|-------------| +| `--query` | Use the interactive *Query a Model Endpoint* menu or call the endpoint with SDK/REST. | + + +Deprecated flags may be removed in the next minor release. + + +## Exit Codes +| Code | Meaning | +|------|---------| +| `0` | Success | +| `1` | User error (bad YAML, missing env vars, etc.) | +| `2` | Cloud provider error (quota, auth) | +| `>2` | Unexpected exception – please file a bug | + +## Environment Variables +See the [Environment](/configuration/Environment) page for a full list. diff --git a/reference/cli-reference.mdx b/reference/cli-reference.mdx new file mode 100644 index 0000000..dcaaa06 --- /dev/null +++ b/reference/cli-reference.mdx @@ -0,0 +1,45 @@ +--- +title: CLI Reference +--- + +## `magemaker` Command-Line Interface +Below is a concise reference of all public flags exposed by the Magemaker CLI (`python -m magemaker.runner`). Run `magemaker --help` at any time for the authoritative list. + +| Flag / Option | Type | Description | +|----------------------|-----------|-----------------------------------------------------------------------------------------------------| +| `--cloud` | string | Configure credentials for one or more cloud providers.
Values: `aws`, `gcp`, `azure`, `all` | +| `--deploy` | path | Deploy a model using a YAML configuration file (see examples in `.magemaker_config/`). | +| `--train` | path | Fine-tune a model using a Training YAML spec. | +| `--config-dir` | path | Override the default `.magemaker_config` directory used for generated YAML files. | +| `--log-level` | string | Set logging verbosity. Values: `debug`, `info` *(default)*, `warning`, `error`. | +| `--version` | flag | Print Magemaker version and exit. | + + +The historical `--query` flag has been **removed** in favour of: +1. The interactive **“Query a Model Endpoint”** option in the main menu, **or** +2. Using SDK helpers (`magemaker.sagemaker.query_endpoint`, `magemaker.gcp.query_endpoint`, `magemaker.azure.query_endpoint`), or the new OpenAI-compatible proxy. + + +## Examples + +### 1. Configure AWS & GCP credentials +```bash +magemaker --cloud all +``` + +### 2. Deploy from YAML +```bash +magemaker --deploy .magemaker_config/llama3-aws.yaml +``` + +### 3. Fine-tune a model +```bash +magemaker --train .magemaker_config/train-bert.yaml +``` + +### 4. Change config directory +```bash +magemaker --config-dir ./infra/configs --deploy ./infra/configs/opt.yaml +``` + +--- diff --git a/reference/cli.mdx b/reference/cli.mdx new file mode 100644 index 0000000..ceaf929 --- /dev/null +++ b/reference/cli.mdx @@ -0,0 +1,53 @@ +--- +title: Magemaker CLI Reference +--- + +This page lists all public flags exposed by the `magemaker` CLI (see `runner.py`). Use `magemaker --help` for the authoritative source. + +## Global syntax + +```bash +magemaker [GLOBAL_FLAGS] [COMMAND_FLAGS] +``` + +## Global flags + +| Flag | Description | Default | +|------|-------------|---------| +| `--cloud ` | Launch the interactive TUI for the specified provider(s) and handle first-time setup. | *none* | +| `--deploy ` | Deploy a model using a YAML file. | *n/a* | +| `--train ` | Fine-tune / train a model using a YAML file. | *n/a* | +| `--config-dir ` | Override the folder where Magemaker reads & writes YAML configs. | `.magemaker_config` | +| `--version` | Print Magemaker version and exit. | | +| `-h`, `--help` | Show built-in help. | | + + +The old `--query` flag was **removed** in v0.8. Query endpoints via the interactive menu or the SDK examples shown in each tutorial. + + +## Examples + +### 1. First-time AWS setup +```bash +magemaker --cloud aws +``` + +### 2. Deploy from YAML +```bash +magemaker --deploy .magemaker_config/bert.yaml +``` + +### 3. Fine-tune from YAML +```bash +magemaker --train .magemaker_config/train-bert.yaml +``` + +### 4. Custom config directory +```bash +export CONFIG_DIR="infra/configs" + +magemaker --deploy infra/configs/llama.yaml +``` + +--- +Keeping this reference up-to-date helps prevent documentation drift. If you add or rename CLI flags, please update this page in the same PR. diff --git a/tutorials/deploying-jumpstart-models-to-aws.mdx b/tutorials/deploying-jumpstart-models-to-aws.mdx new file mode 100644 index 0000000..9e0c915 --- /dev/null +++ b/tutorials/deploying-jumpstart-models-to-aws.mdx @@ -0,0 +1,86 @@ +--- +title: Deploying SageMaker JumpStart Models +--- + +## Overview +AWS SageMaker JumpStart offers hundreds of pre-built foundation models and solutions that can be deployed with only a few clicks. Magemaker now integrates directly with JumpStart, letting you **search, configure, and deploy** JumpStart models from the same interactive CLI you already use for Hugging Face deployments. + + +JumpStart models are available only on AWS. Make sure you have completed the [AWS configuration](/configuration/AWS) steps before continuing. + + +## 1. Launch the JumpStart Search Interface +Run the Magemaker CLI and pick **“Deploy a JumpStart Model”** from the interactive menu: + +```bash +magemaker --cloud aws +``` + +You will be prompted to choose a **framework** (PyTorch, TensorFlow, etc.) and then type a **search term**. Magemaker will display matching JumpStart models in a table with their model IDs, tasks, and recommended instance types. + +![JumpStart Search](../Images/jumpstart-search.png) + +## 2. Select and Configure the Model +After selecting a model, Magemaker asks you to fill in the deployment details: + +- Endpoint name (defaults to the model ID) +- Instance type (pre-filtered to what the model supports) +- Instance count +- Optional GPU count and quantization settings + +Once confirmed, Magemaker generates a YAML spec and starts the deployment. + +```yaml +deployment: !Deployment + destination: aws + endpoint_name: textgeneration-gpt2-jumpstart + instance_count: 1 + instance_type: ml.g5.2xlarge + +models: + - !Model + id: huggingface-textgeneration-gpt2 + source: sagemaker +``` + +The YAML file is saved automatically in `.magemaker_config/` for reproducibility. + +## 3. Monitor Deployment Progress +Magemaker streams the SageMaker deployment logs in real time. Large JumpStart models (for example GPT-2 XL) can take 10–15 minutes to finish. + + +Make sure the selected instance type is available in your account/region. If you hit **QuotasExceeded** errors, open a quota-increase request in the AWS console. + + +## 4. Query the Deployed Endpoint +You can query the endpoint either interactively (`Query a Model Endpoint` in the CLI) or via a YAML file: + +```yaml +query: !Query + input: "Write a limerick about deployment pipelines." +``` + +```bash +magemaker --query .magemaker_config/textgeneration-gpt2-jumpstart-query.yaml +``` + +Alternatively, use Python: + +```python +from sagemaker.huggingface.model import HuggingFacePredictor + +predictor = HuggingFacePredictor("textgeneration-gpt2-jumpstart") +print(predictor.predict({"inputs": "Hello from Magemaker!"})) +``` + +## 5. Clean Up +As with all SageMaker endpoints, **billing starts as soon as the instance is live**. Remember to delete unused endpoints: + +```bash +magemaker --cloud aws # choose "Delete a Model Endpoint" from the menu +``` + +## Next Steps +- Combine JumpStart models with your own fine-tuned checkpoints +- Automate deployments in CI using `magemaker --deploy` inside GitHub Actions +- Explore multi-model endpoints to host several JumpStart models behind one endpoint diff --git a/tutorials/deploying-llama-3-to-aws.mdx b/tutorials/deploying-llama-3-to-aws.mdx index 46f0659..ac4d865 100644 --- a/tutorials/deploying-llama-3-to-aws.mdx +++ b/tutorials/deploying-llama-3-to-aws.mdx @@ -107,4 +107,3 @@ if __name__ == "__main__": ``` ## Conclusion You have successfully deployed and queried Llama 3 on AWS SageMaker using Magemaker's interactive dropdown menu. For any questions or feedback, feel free to contact us at [support@slashml.com](mailto:support@slashml.com). - diff --git a/tutorials/deploying-llama-3-to-azure.mdx b/tutorials/deploying-llama-3-to-azure.mdx index 679ba23..526756c 100644 --- a/tutorials/deploying-llama-3-to-azure.mdx +++ b/tutorials/deploying-llama-3-to-azure.mdx @@ -83,20 +83,18 @@ From the dropdown, select `Query a Model Endpoint` to see the list of model endp Or you can use the following code ```python - from azure.identity import DefaultAzureCredential from azure.ai.ml import MLClient -from azure.mgmt.resource import ResourceManagementClient - from dotenv import dotenv_values +import json import os -def query_azure_endpoint(endpoint_name, query): +def query_azure_endpoint(endpoint_name: str, query: str): # Initialize the ML client - subscription_id = dotenv_values(".env").get("AZURE_SUBSCRIPTION_ID") - resource_group = dotenv_values(".env").get("AZURE_RESOURCE_GROUP") - workspace_name = dotenv_values(".env").get("AZURE_WORKSPACE_NAME") + subscription_id = dotenv_values(".env").get("AZURE_SUBSCRIPTION_ID") + resource_group = dotenv_values(".env").get("AZURE_RESOURCE_GROUP") + workspace_name = dotenv_values(".env").get("AZURE_WORKSPACE_NAME") credential = DefaultAzureCredential() ml_client = MLClient( @@ -106,38 +104,31 @@ def query_azure_endpoint(endpoint_name, query): workspace_name=workspace_name ) - import json + # Prepare request payload + payload = {"inputs": query} - # Test data - test_data = { - "inputs": query - } + # Save payload to a temporary file + with open("request.json", "w") as f: + json.dump(payload, f) - # Save the test data to a temporary file - with open("test_request.json", "w") as f: - json.dump(test_data, f) - - # Get prediction + # Invoke the endpoint response = ml_client.online_endpoints.invoke( endpoint_name=endpoint_name, - request_file = 'test_request.json' + request_file="request.json" ) - print('Raw Response Content:', response) - # delete a file - os.remove("test_request.json") + print("Raw Response Content:", response) + os.remove("request.json") return response - -endpoint_id = 'your-endpoint-id-here' - -input_text = 'What are you?' -resp = query_azure_endpoint(endpoint_id=endpoint_id, input_text=input_text) -print(resp) +if __name__ == "__main__": + endpoint_name = "your-endpoint-id-here" + input_text = "What are you?" + resp = query_azure_endpoint(endpoint_name=endpoint_name, query=input_text) + print(resp) ``` ## Conclusion You have successfully deployed and queried Llama 3 on Azure using Magemaker's interactive dropdown menu. For any questions or feedback, feel free to contact us at [support@slashml.com](mailto:support@slashml.com). - diff --git a/tutorials/deploying-llama-3-to-gcp.mdx b/tutorials/deploying-llama-3-to-gcp.mdx index a94d616..418a3be 100644 --- a/tutorials/deploying-llama-3-to-gcp.mdx +++ b/tutorials/deploying-llama-3-to-gcp.mdx @@ -3,144 +3,128 @@ title: Deploying Llama 3 to GCP --- ## Introduction -This tutorial guides you through deploying Llama 3 to Google Cloud Platform (GCP) Vertex AI using Magemaker and querying it using the interactive dropdown menu. Ensure you have followed the [installation](installation) steps before proceeding. +This tutorial guides you through deploying **Llama 3** to Google Cloud Platform (GCP) **Vertex AI** using Magemaker and then querying the model. Make sure you have completed the [installation](installation) steps and configured your GCP credentials before continuing. - -You may need to request a quota increase for specific machine types and GPUs in the region where you plan to deploy the model. Check your GCP quotas before proceeding. + +You may need to request a **quota increase** for specific machine types and GPUs in the region where you plan to deploy. Check your GCP quotas before proceeding. -## Step 1: Setting Up Magemaker for GCP - -Run the following command to configure Magemaker for GCP Vertex AI deployment: +## Step 1 – Configure Magemaker for GCP +Run the following command to initialise Magemaker with your GCP project and region: ```sh magemaker --cloud gcp ``` -This initializes Magemaker with the necessary configurations for deploying models to Vertex AI. +This command will: +1. Prompt you for your **PROJECT_ID** and **GCLOUD_REGION** (these will be stored in a `.env` file). +2. Verify that the Vertex AI API is enabled. +3. Create any missing service accounts or IAM roles if necessary. -## Step 2: YAML-based Deployment - -For reproducible deployments, use YAML configuration: +## Step 2 – Deploying with YAML (recommended) +YAML files make deployments reproducible and easy to track in version control. ```sh -magemaker --deploy .magemaker_config/your-model.yaml +magemaker --deploy .magemaker_config/llama3-deploy.yaml ``` -Example YAML for GCP deployment: +Example YAML (`llama3-deploy.yaml`): ```yaml deployment: !Deployment destination: gcp - endpoint_name: llama3-endpoint - accelerator_count: 1 - instance_type: n1-standard-8 - accelerator_type: NVIDIA_T4 - num_gpus: 1 - quantization: null + endpoint_name: llama3-endpoint # Change to something unique per project + instance_type: n1-standard-8 # CPU/RAM for the VM + accelerator_type: NVIDIA_T4 # GPU model + accelerator_count: 1 # Number of GPUs + num_gpus: 1 # For Magemaker-managed autoscaling (leave 1 for single-GPU) + quantization: null # e.g. bitsandbytes, gguf – leave null for FP16/FP32 models: - !Model - id: meta-llama/Meta-Llama-3-8B-Instruct - location: null - predict: null + id: meta-llama/Meta-Llama-3-8B-Instruct # Hugging Face model ID source: huggingface task: text-generation - version: null ``` + - For gated models like llama from Meta, you have to accept terms of use for model on hugging face and adding Hugging face token to the environment are necessary for deployment to go through. +For gated models (e.g. Meta-Llama), you must **accept the model licence on Hugging Face** **and** set `HUGGING_FACE_HUB_KEY` in your `.env` file. - -### Selecting an Appropriate Instance -For Llama 3, a machine type such as `n1-standard-8` with an attached NVIDIA T4 GPU (`NVIDIA_T4`) is a suitable configuration for most use cases. Adjust the instance type and GPU based on your workload requirements. +### Choosing an instance type +A `n1-standard-8` VM with a single **T4 GPU** works for 8 B parameter models. If you need lower latency, consider an **L4** (`g2-standard-12`) or **A100** (`a2-highgpu-1g`) instance. -If you encounter quota issues, submit a quota increase request in the GCP console under "IAM & Admin > Quotas" for the specific GPU type in your deployment region. +If you hit a quota error during deployment, request a GPU quota increase in **IAM & Admin → Quotas** for your region. -## Step 3: Querying the Deployed Model - -Once the deployment is complete, note down the endpoint id. - -You can use the interactive dropdown menu to quickly query the model. +## Step 3 – Querying the endpoint +After the deployment completes, Vertex AI prints an **endpoint ID** (not the display name!). Save this value; you need it for queries. -### Querying Models +### Option A – Interactive Magemaker menu +Run `magemaker --cloud gcp`, choose **“Query a Model Endpoint”**, select the endpoint, and enter your prompt. -From the dropdown, select `Query a Model Endpoint` to see the list of model endpoints. Press space to select the endpoint you want to query. Enter your query in the text box and press enter to get the response. +### Option B – Python (REST) example +The snippet below sends a raw REST request that is compatible with any Vertex AI text-generation model. Replace `ENDPOINT_ID` with the value shown in the deployment logs. -![Query Endpoints](../Images/query-1.png) - -Or you can use the following code: -```python -from google.cloud import aiplatform -from google.protobuf import json_format -from google.protobuf.struct_pb2 import Value -import json +```python +import google.auth +import google.auth.transport.requests +import requests from dotenv import dotenv_values - -def query_vertexai_endpoint_rest( - endpoint_id: str, - input_text: str, - token_path: str = None -): - import google.auth - import google.auth.transport.requests - import requests - - # TODO: this will have to come from config files - project_id = dotenv_values('.env').get('PROJECT_ID') - location = dotenv_values('.env').get('GCLOUD_REGION') - - - # Get credentials - if token_path: - credentials, project = google.auth.load_credentials_from_file(token_path) - else: - credentials, project = google.auth.default() - - # Refresh token - auth_req = google.auth.transport.requests.Request() - credentials.refresh(auth_req) - - # Prepare headers and URL - headers = { - "Authorization": f"Bearer {credentials.token}", - "Content-Type": "application/json" - } - - url = f"https://{location}-aiplatform.googleapis.com/v1/projects/{project_id}/locations/{location}/endpoints/{endpoint_id}:predict" - - # Prepare payload - payload = { - "instances": [ - { - "inputs": input_text, - # TODO: this also needs to come from configs - "parameters": { - "max_new_tokens": 100, - "temperature": 0.7, - "top_p": 0.95 - } +PROJECT_ID = dotenv_values('.env').get('PROJECT_ID') +LOCATION = dotenv_values('.env').get('GCLOUD_REGION') +ENDPOINT_ID = "your-endpoint-id-here" # <- change me + +INPUT_TEXT = "What are you?" +TOKEN_PATH = None # Optional: path to a service-account JSON key + +# 1 – obtain an access token +if TOKEN_PATH: + credentials, _ = google.auth.load_credentials_from_file(TOKEN_PATH) +else: + credentials, _ = google.auth.default() +credentials.refresh(google.auth.transport.requests.Request()) + +auth_header = { + "Authorization": f"Bearer {credentials.token}", + "Content-Type": "application/json" +} + +# 2 – build the prediction URL +url = ( + f"https://{LOCATION}-aiplatform.googleapis.com/v1/" + f"projects/{PROJECT_ID}/locations/{LOCATION}/endpoints/{ENDPOINT_ID}:predict" +) + +# 3 – craft the payload (Vertex AI text-generation schema) +payload = { + "instances": [ + { + "inputs": INPUT_TEXT, + "parameters": { + "max_new_tokens": 100, + "temperature": 0.7, + "top_p": 0.95 } - ] - } - - # Make request - response = requests.post(url, headers=headers, json=payload) - print('Raw Response Content:', response.content.decode()) - - return response.json() - -endpoint_id="your-endpoint-id-here" - -input_text='What are you?"' -resp = query_vertexai_endpoint_rest(endpoint_id=endpoint_id, input_text=input_text) -print(resp) + } + ] +} + +# 4 – send the request +response = requests.post(url, headers=auth_header, json=payload, timeout=90) +response.raise_for_status() +print(response.json()) ``` +### Option C – Vertex AI Python SDK +If you prefer the official SDK, Magemaker’s underlying model artefact is compatible with the standard `aiplatform.Endpoint` interface. Refer to the [Vertex AI SDK docs](https://cloud.google.com/vertex-ai/docs) for sample code. + ## Conclusion -You have successfully deployed and queried Llama 3 on GCP Vertex AI using Magemaker's interactive dropdown menu. For any questions or feedback, feel free to contact us at [support@slashml.com](mailto:support@slashml.com). +You have now: +1. Configured Magemaker for GCP +2. Deployed **Llama 3** to Vertex AI via YAML +3. Queried the endpoint using both the interactive CLI and a Python REST call +For questions or feedback, reach us at [support@slashml.com](mailto:support@slashml.com). diff --git a/tutorials/deploying-sagemaker-jumpstart-models.mdx b/tutorials/deploying-sagemaker-jumpstart-models.mdx new file mode 100644 index 0000000..13657b7 --- /dev/null +++ b/tutorials/deploying-sagemaker-jumpstart-models.mdx @@ -0,0 +1,45 @@ +--- +title: Deploying AWS JumpStart Models with Magemaker +--- + +## Overview +Magemaker’s interactive menu lets you browse and deploy any model available in **Amazon SageMaker JumpStart**—no YAML required. + +## 1 – Open the interactive menu +```bash +magemaker --cloud aws +``` + +Choose **“Deploy a model endpoint”** ➜ **“Deploy a SageMaker model”**. +A searchable list of JumpStart models appears. Start typing to filter by **task** (e.g. *eqa*) or **model name** (e.g. *llama*). + +![JumpStart Search](../Images/jumpstart-search.png) + +## 2 – Select an instance type +Magemaker automatically fetches your service-quota limits in the background. When prompted, pick an available instance (e.g. `ml.g5.2xlarge`). + + +If you do not see GPU instances, open an AWS quota-increase request for SageMaker. + + +## 3 – Deployment +Magemaker creates the model, endpoint configuration and endpoint for you. Progress is streamed directly in the terminal. + +Once finished, you’ll see: +```text +✅ Endpoint jumpstart-bert-20240615 is InService +``` +A YAML file describing the deployment is saved to `.magemaker_config/` so you can reproduce or modify it later. + +## 4 – Querying and managing the endpoint +Use the same menu to: +* **List** active endpoints +* **Query** a text-generation model +* **Delete** endpoints you no longer need + +## 5 – Best practices +- Always shut down endpoints when idle. +- Version-control the generated YAML for CI/CD workflows. +- Use the **`--verbose`** flag for full debug logs (`magemaker --cloud aws --verbose`). + +Enjoy one-click access to the entire JumpStart catalogue! 🚀 diff --git a/tutorials/openai-compatible-proxy.mdx b/tutorials/openai-compatible-proxy.mdx new file mode 100644 index 0000000..da58ce5 --- /dev/null +++ b/tutorials/openai-compatible-proxy.mdx @@ -0,0 +1,111 @@ +--- +title: Self-Hosted OpenAI-Compatible Proxy +--- + +## Introduction + +Magemaker ships with a lightweight FastAPI server (`server.py`) that exposes any SageMaker–deployed model **through the OpenAI Chat Completions API**. +This means you can point the official `openai` Python / JS SDK—or any OpenAI-compatible client—at your own endpoint and avoid vendor lock-in while keeping the familiar developer experience. + +Typical use-cases: + +1. Use commercial tools that only speak the OpenAI API (LangChain, Llama-Index, Supabase Vector search, etc.) +2. Centrally manage multiple in-house models behind a single `/chat/completions` endpoint. +3. Experiment locally before moving traffic to production. + + +Running the proxy **does not** deploy any models for you. Make sure you have already deployed at least one endpoint via Magemaker before starting the server. + + +--- + +## 1 Starting the server + +```bash +# In the root of your project (where server.py lives) +python server.py +# or, if you prefer uvicorn CLI directly +uvicorn server:app --host 0.0.0.0 --port 8000 +``` + +The server automatically picks up your AWS region from the `.env` file created by the `magemaker --cloud aws` command. + +Once running you will have three REST endpoints: + +| Method | Path | Description | +| ------ | ---- | ----------- | +| GET | `/endpoint/{endpoint_name}` | Fetch basic metadata about a SageMaker endpoint | +| POST | `/endpoint/{endpoint_name}/query` | Free-form query against a specific endpoint (uses the Magemaker YAML config for that endpoint) | +| POST | `/chat/completions` | **OpenAI Chat Completions-compatible** endpoint backed by SageMaker | + +--- + +## 2 Using the OpenAI SDK + +Point the SDK to **your** base URL and keep the rest of the code unchanged: + +```python +import openai + +openai.api_key = "sk-totally-ignored" # A dummy key is still required by the client +openai.base_url = "http://localhost:8000/v1" # Add the /v1 prefix so the paths match the OpenAI spec + +response = openai.chat.completions.create( + model="meta-llama/Meta-Llama-3-8B-Instruct", # any model id you previously deployed + messages=[ + {"role": "user", "content": "Explain the greenhouse effect in one paragraph"} + ], +) +print(response.choices[0].message.content) +``` + +Behind the scenes the proxy will: + +1. Look up SageMaker endpoints that serve the requested `model`. +2. Take the **first available** endpoint (future versions will add load-balancing and routing rules). +3. Forward the prompt to `make_query_request()` using Magemaker’s built-in helpers. +4. Return a response that is 1-to-1 compatible with the official OpenAI schema. + +--- + +## 3 Direct endpoint queries + +If you need full control over inference parameters you can bypass the OpenAI layer and hit the lower-level route: + +```bash +curl -X POST http://localhost:8000/endpoint/llama3-endpoint/query \ + -H "Content-Type: application/json" \ + -d '{ + "input": "Translate \"good morning\" to French", + "parameters": {"max_new_tokens": 64, "temperature": 0.3} + }' +``` + +The payload is the same `Query` schema you use in Magemaker YAML files. + +--- + +## 4 Error handling + +• If the requested model is **not deployed**, the server raises `NotDeployedException` and returns HTTP 404. +• Any SageMaker runtime errors are proxied back with HTTP 500. + + +The current implementation always chooses the *first* endpoint that matches a model. If you run a multi-model endpoint or multiple copies of the same model, you may want to extend `get_endpoints_for_model()` to apply custom routing logic. + + +--- + +## 5 Deploying alongside Magemaker in production + +1. Build a Docker image that installs `magemaker[server]` (FastAPI + Uvicorn) and your favourite web server. +2. Mount the folder that holds your `mint.json` & Magemaker YAML configs so the proxy can resolve models. +3. Add a CI step that redeploys / restarts the container whenever you merge a new model configuration. + +--- + +## Conclusion + +You now have a local OpenAI-compatible proxy running on top of your own SageMaker endpoints—zero vendor lock-in, same developer experience. + +For questions or feature requests open an issue on GitHub or ping us at [support@slashml.com](mailto:support@slashml.com). diff --git a/tutorials/searching-and-deploying-jumpstart-models.mdx b/tutorials/searching-and-deploying-jumpstart-models.mdx new file mode 100644 index 0000000..1a94c30 --- /dev/null +++ b/tutorials/searching-and-deploying-jumpstart-models.mdx @@ -0,0 +1,84 @@ +--- +title: Deploying AWS JumpStart Models with Magemaker +--- + +## Introduction +AWS JumpStart provides a catalogue of pre-configured foundation models that you can deploy directly to SageMaker. Magemaker now ships with an **interactive search** that lets you find and deploy these models without leaving the terminal. + + +JumpStart models are only deployable to **AWS SageMaker** at the moment. + + +## 1 – Configure AWS +Make sure you have already run `magemaker --cloud aws` and that your AWS credentials & SageMaker role are in place. See the [installation](/installation) guide if you haven't. + +## 2 – Open the Interactive Menu +```bash +magemaker --cloud aws +``` +Choose **“Search JumpStart Models”** from the main menu. + +![JumpStart search](../Images/jumpstart-search.png) + +## 3 – Search & Select a Model +1. Type a keyword (e.g. `llama`, `mistral`, `falcon`). +2. Use the arrow keys to highlight a model. +3. Press **Enter** to view details and confirm deployment. + +The tool will automatically: +- Suggest compatible instance types. +- Validate that you have quota for the chosen instance. +- Create a unique endpoint name based on the model and timestamp. + +## 4 – Monitor the Deployment +A progress bar shows each SageMaker step (model upload, endpoint config, endpoint creation). Large models can take 10-20 minutes. + +Once complete, Magemaker writes a YAML file to `.magemaker_config/.yaml` that captures the deployment so you can re-deploy via CI/CD later. + +## 5 – Query the Endpoint +Use the interactive menu **“Query a Model Endpoint”** or follow the Python example in the [Llama 3 AWS tutorial](/tutorials/deploying-llama-3-to-aws). + +```bash +magemaker --cloud aws # choose "Query a Model Endpoint" +``` + +--- + +## YAML-Only Workflow +Prefer Infrastructure-as-Code? Once you know the JumpStart model ID you can skip the menu entirely: + +```yaml +# .magemaker_config/falcon-7b.yaml +models: +- !Model + id: huggingface-textgeneration-falcon-7b-instruct + source: sagemaker + +deployment: !Deployment + destination: aws + endpoint_name: falcon7b-endpoint + instance_type: ml.g5.12xlarge + instance_count: 1 +``` + +```bash +magemaker --deploy .magemaker_config/falcon-7b.yaml +``` + +--- + +## Cleaning Up +Remember to delete endpoints you no longer need: + +```bash +magemaker --cloud aws # choose "Delete a Model Endpoint" +``` + +--- + +## FAQ +**Q:** *Why do I see `InsufficientCapacity` errors?* +**A:** The requested GPU instance isn’t available in your region. Try a different region or request more quota. + +**Q:** *Can I fine-tune JumpStart models?* +**A:** Not yet through Magemaker. Fine-tuning currently supports Hugging Face models on SageMaker only. diff --git a/tutorials/searching-jumpstart-models.mdx b/tutorials/searching-jumpstart-models.mdx new file mode 100644 index 0000000..ec11e6f --- /dev/null +++ b/tutorials/searching-jumpstart-models.mdx @@ -0,0 +1,43 @@ +--- +title: Deploying AWS JumpStart Models via Interactive Search +--- + +## Overview +Magemaker includes an **interactive JumpStart search utility** (`search_jumpstart_models.py`) that lets you browse, filter, and deploy AWS SageMaker JumpStart models without leaving the terminal. + + +JumpStart is AWS’s curated catalogue of pre-built model packages (open-source and proprietary). Magemaker automates the heavy lifting of selecting the right container, instance type, and deployment parameters for you. + + +## Launching the Search Tool + +```bash +magemaker --cloud aws # ensure AWS is configured +``` +In the interactive main menu choose **“Search JumpStart Models”**. + +![JumpStart Search](../Images/jumpstart-search.png) + +## Workflow +1. Type to filter by model name, task (NLP, CV, etc.), or framework. +2. Press **Enter** on a model to view details (description, recommended instance types, license). +3. Confirm deployment settings (instance size, endpoint name, etc.). +4. Magemaker will start the deployment and stream progress. + +## YAML Output for Re-deployments +After a successful deployment Magemaker writes a YAML file into `.magemaker_config/` so you can: +```bash +magemaker --deploy .magemaker_config/your-endpoint.yaml +``` +This enables CI/CD or Infrastructure-as-Code style repeatability. + +## Tips & Tricks +• Use the search box to filter by **task** (e.g. `text-generation`) or **framework** (`huggingface`, `tensorflow`). +• If you need GPUs, pick an instance suffix like `ml.g5.xlarge`; Magemaker warns if quotas are insufficient. +• Endpoints incur cost while running – remember to delete unused endpoints from the main menu. + +## Troubleshooting +• `QuotaExceeded` – submit a SageMaker service-quota increase in AWS Console. +• `Model asset not found` – the JumpStart package may not be available in your region. Try `us-east-1`. + +--- diff --git a/tutorials/searching-sagemaker-jumpstart-models.mdx b/tutorials/searching-sagemaker-jumpstart-models.mdx new file mode 100644 index 0000000..f213c84 --- /dev/null +++ b/tutorials/searching-sagemaker-jumpstart-models.mdx @@ -0,0 +1,68 @@ +--- +title: Searching & Deploying SageMaker JumpStart Models +--- + +## Overview +AWS SageMaker JumpStart offers hundreds of pre-trained foundation models. Magemaker’s **interactive search** feature (powered by `search_jumpstart_models.py`) lets you: + +1. Discover available JumpStart models from the terminal +2. Pick a model with arrow-key navigation & fuzzy search +3. Deploy it to SageMaker with a single press of Enter + +No YAML required – great for fast experimentation. + +## Step 1 – Launch the interactive menu + +```bash +magemaker --cloud aws # or just `magemaker` if AWS is your default +``` +Choose **“Search JumpStart Models”** from the main menu. + +![JumpStart search GIF](../Images/jumpstart-search.gif) + + +## Step 2 – Select a model + +* Start typing to fuzzy-search (`llama`, `t5`, `stable-diffusion`, …) +* Use ↑ / ↓ to highlight a result +* Press Space to view model metadata (parameters, licenses, etc.) +* Press Enter to continue + +## Step 3 – Configure the deployment + +Magemaker will ask for: + +1. **Endpoint name** – defaults to the model id +2. **Instance type** – suggestions are filtered by the model’s minimum requirements +3. **Initial instance count** + +After confirmation the usual SageMaker deployment progress bar appears. + +## Step 4 – Query the endpoint + +Once deployment succeeds you can query the model via: + +```bash +# Interactive query menu +magemaker --cloud aws +``` +Select **“Query a Model Endpoint”** and follow the prompt. + +Or fetch it programmatically: + +```python +from sagemaker.huggingface.model import HuggingFacePredictor + +predictor = HuggingFacePredictor("my-endpoint-name") +print(predictor.predict({"inputs": "Hello JumpStart!"})) +``` + +## Converting the deployment to YAML + +The tool automatically writes a reproducible YAML file to `.magemaker_config/.yaml` so you can re-deploy the same model later or commit it to source control. + +## Notes & Quotas + +* JumpStart models follow the same cost model as normal SageMaker endpoints. +* Some models (e.g. Meta Llama) are gated – you must provide a valid `HUGGING_FACE_HUB_KEY` in your `.env`. +* If you hit `LimitExceededException`, request a quota increase for the required instance family & region. diff --git a/updated_readme.md b/updated_readme.md index bcfc60b..63359a7 100644 --- a/updated_readme.md +++ b/updated_readme.md @@ -1,26 +1,21 @@ -
-

Magemaker v0.1, by SlashML

+

Magemaker by SlashML

- Deploy open source AI models to AWS in minutes. + Deploy open-source AI models to AWS, GCP, and Azure in minutes.

- -
Table of Contents
    -
  1. - About Magemaker -
  2. +
  3. About Magemaker
  4. Getting Started
      @@ -39,218 +34,111 @@ ## About Magemaker -Magemaker is a Python tool that simplifies the process of deploying an open source AI model to your own cloud. Instead of spending hours digging through documentation to figure out how to get AWS working, Magemaker lets you deploy open source AI models directly from the command line. +Magemaker is a Python CLI that lets you deploy Hugging Face models (and AWS SageMaker JumpStart models) directly to the three major cloud providers—AWS SageMaker, Google Cloud Vertex AI, and Azure Machine Learning—without writing any cloud-specific boilerplate. Choose a model, pick an instance, and Magemaker spins up an endpoint that’s ready to query or fine-tune in just a few minutes. -Choose a model from Hugging Face or SageMaker, and Magemaker will spin up a SageMaker instance with a ready-to-query endpoint in minutes. - - -
      +--- ## Getting Started - -Magemaker works with AWS. Azure and GCP support are coming soon! - -To get a local copy up and running follow these simple steps. +Magemaker supports **AWS, GCP, and Azure**. You can configure one, two, or all three providers in the same project. ### Prerequisites - -* Python -* An AWS account -* Quota for AWS SageMaker instances (by default, you get 2 instances of ml.m5.xlarge for free) -* Certain Hugging Face models (e.g. Llama2) require an access token ([hf docs](https://huggingface.co/docs/hub/en/models-gated#access-gated-models-as-a-user)) - -### Configuration - -**Step 1: Set up AWS and SageMaker** - -To get started, you’ll need an AWS account which you can create at https://aws.amazon.com/. Then you’ll need to create access keys for SageMaker. - -We wrote up the steps in [Google Doc](https://docs.google.com/document/d/1NvA6uZmppsYzaOdkcgNTRl7Nb4LbpP9Koc4H_t5xNSg/edit?tab=t.0#heading=h.farbxuv3zrzm) as well. - - - -### Installing the package - -**Step 1** - -```sh +* Python 3.11 (3.12 + 3.13 not yet supported) +* At least one cloud-provider account (AWS, GCP, Azure) +* Corresponding cloud CLI tools installed locally + • aws cli • gcloud SDK • azure cli +* Quota for the instance/GPU types you plan to use +* (Optional) Hugging Face access token for gated models such as Llama 3 + +### Installation +```bash pip install magemaker ``` -**Step 2: Running magemaker** - -Run it by simply doing the following: - -```sh -magemaker -``` - -If this is your first time running this command. It will configure the AWS client so you’re ready to start deploying models. You’ll be prompted to enter your Access Key and Secret here. You can also specify your AWS region. The default is us-east-1. You only need to change this if your SageMaker instance quota is in a different region. - -Once configured, it will create a `.env` file and save the credentials there. You can also add your Hugging Face Hub Token to this file if you have one. - -```sh -HUGGING_FACE_HUB_KEY="KeyValueHere" +### Initial Cloud Configuration +Run the CLI once with the `--cloud` flag to generate a _.env_ file and store your credentials (you can pass `all` to configure every provider in one go): +```bash +magemaker --cloud [aws|gcp|azure|all] ``` +If you ever need to move the config directory, set `CONFIG_DIR=/path/to/dir` in your environment. -

      (back to top)

      - +> **Never commit your .env file to version control!** - - -
      +--- ## Using Magemaker -### Deploying models from dropdown - -When you run `magemaker` comamnd it will give you an interactive menu to deploy models. You can choose from a dropdown of models to deploy. - -#### Deploying Hugging Face models -If you're deploying with Hugging Face, copy/paste the full model name from Hugging Face. For example, `google-bert/bert-base-uncased`. Note that you’ll need larger, more expensive instance types in order to run bigger models. It takes anywhere from 2 minutes (for smaller models) to 10+ minutes (for large models) to spin up the instance with your model. - -#### Deploying Sagemaker models -If you are deploying a Sagemaker model, select a framework and search from a model. If you a deploying a custom model, provide either a valid S3 path or a local path (and the tool will automatically upload it for you). Once deployed, we will generate a YAML file with the deployment and model in the `CONFIG_DIR=.magemaker_config` folder. You can modify the path to this folder by setting the `CONFIG_DIR` environment variable. - -#### Deploy using a yaml file -We recommend deploying through a yaml file for reproducability and IAC. From the cli, you can deploy a model without going through all the menus. You can even integrate us with your Github Actions to deploy on PR merge. Deploy via YAML files simply by passing the `--deploy` option with local path like so: - +### 1. Interactive menu (recommended for exploring) +```bash +magemaker --cloud aws # or gcp / azure / all ``` -magemaker --deploy .magemaker_config/bert-base-uncased.yaml +The menu lets you: +* Search & deploy **Hugging Face** or **AWS JumpStart** models +* List active endpoints +* Query or delete endpoints + +### 2. Infrastructure-as-Code via YAML +For reproducible deployments & CI/CD: +```bash +magemaker --deploy .magemaker_config/your-model.yaml ``` - -Following is a sample yaml file for deploying a model the same google bert model mentioned above: - +Example (AWS SageMaker): ```yaml deployment: !Deployment destination: aws - # Endpoint name matches model_id for querying atm. - endpoint_name: test-bert-uncased + endpoint_name: bert-demo instance_count: 1 instance_type: ml.m5.xlarge - -models: -- !Model - id: google-bert/bert-base-uncased - source: huggingface -``` - -Following is a yaml file for deploying a llama model from HF: -```yaml -deployment: !Deployment - destination: aws - endpoint_name: test-llama2-7b - instance_count: 1 - instance_type: ml.g5.12xlarge - num_gpus: 4 - # quantization: bitsandbytes - models: -- !Model - id: meta-llama/Meta-Llama-3-8B-Instruct - source: huggingface - predict: - temperature: 0.9 - top_p: 0.9 - top_k: 20 - max_new_tokens: 250 + - !Model + id: google-bert/bert-base-uncased + source: huggingface ``` +Vertex AI and Azure use the same schema with their specific `instance_type` / `accelerator_type` fields. -#### Fine-tuning a model using a yaml file - -You can also fine-tune a model using a yaml file, by using the `train` option in the command and passing path to the yaml file - -` +### 3. Fine-tuning +```bash magemaker --train .magemaker_config/train-bert.yaml -` - -Here is an example yaml file for fine-tuning a hugging-face model: - -```yaml -training: !Training - destination: aws - instance_type: ml.p3.2xlarge - instance_count: 1 - training_input_path: s3://jumpstart-cache-prod-us-east-1/training-datasets/tc/data.csv - hyperparameters: !Hyperparameters - epochs: 1 - per_device_train_batch_size: 32 - learning_rate: 0.01 - -models: -- !Model - id: meta-textgeneration-llama-3-8b-instruct - source: huggingface ``` +If you omit common hyper-parameters, Magemaker will auto-fill sensible defaults based on the model/task. +### 4. Programmatic or REST queries +* Use the interactive **Query a Model Endpoint** menu, **or** +* Call the cloud SDK directly (see docs for each provider), **or** +* Spin up the bundled **OpenAI-compatible FastAPI proxy** (`python server.py`) and hit `/v1/chat/completions` from any OpenAI client. -
      -
      - -If you’re using the `ml.m5.xlarge` instance type, here are some small Hugging Face models that work great: -
      -
      - -**Model: [google-bert/bert-base-uncased](https://huggingface.co/google-bert/bert-base-uncased)** - -- **Type:** Fill Mask: tries to complete your sentence like Madlibs -- **Query format:** text string with `[MASK]` somewhere in it that you wish for the transformer to fill -- -
      -
      - -**Model: [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2)** - -- **Type:** Feature extraction: turns text into a 384d vector embedding for semantic search / clustering -- **Query format:** "*type out a sentence like this one.*" - -
      -
      - +--- -### Deactivating models +## Deactivating Endpoints +Endpoints accrue charges while running. Delete them from the interactive menu or via the appropriate `delete_*` command to avoid unexpected costs. -Any model endpoints you spin up will run continuously unless you deactivate them! Make sure to delete endpoints you’re no longer using so you don’t keep getting charged for your SageMaker instance. - - -

      (back to top)

      - - - -
      +--- ## What we're working on next -- [ ] More robust error handling for various edge cases -- [ ] Verbose logging -- [ ] Enabling / disabling autoscaling -- [ ] Deployment to Azure and GCP +- [ ] Improved error handling & verbose logging +- [ ] Auto-/manual scaling controls +- [ ] Deeper multi-cloud orchestration +- [ ] Additional task types (vision, audio, multimodal) -

      (back to top)

      +--- +## Known Issues +- Querying currently supports **text-based** models only +- Endpoint deletion is asynchronous and can take a few minutes to disappear +- Deploying the **same model** twice within the **same minute** can fail due to name collisions - -
      +--- -## Known issues -- [ ] Querying within Magemaker currently only works with text-based model - doesn’t work with multimodal, image generation, etc. -- [ ] Deleting a model is not instant, it may show up briefly after it was queued for deletion -- [ ] Deploying the same model within the same minute will break +## Contributing +We welcome issues & PRs! See the [Contributing Guide](concepts/contributing) for setup instructions, testing commands, and doc guidelines. -

      (back to top)

      - - - -
      +--- ## License +Apache 2.0 — see `LICENSE` for details. -Distributed under the Apache 2.0 License. See `LICENSE` for more information. - - -
      +--- ## Contact +Questions or feedback? Email [support@slashml.com](mailto:support@slashml.com) or join our Discord (link in repository README). -You can reach us, faizan & jneid, at [support@slashml.com](mailto:support@slashml.com). - -We’d love to hear from you! We’re excited to learn how we can make this more valuable for the community and welcome any and all feedback and suggestions. +

      (back to top)

      \ No newline at end of file