feat: Add ai-optimization skill for SageMaker AI Optimization APIs#147
feat: Add ai-optimization skill for SageMaker AI Optimization APIs#147Lokiiiiii wants to merge 2 commits intoawslabs:mainfrom
Conversation
New skill covering the 14 SageMaker AI Optimization API operations (AIWorkloadConfig, AIBenchmarkJob, AIRecommendationJob) with guided workflows for benchmarking LLM inference and getting deployment recommendations. Skill structure: - SKILL.md (110 lines): Main skill with intent-matching description - references/benchmark-workflow.md (96 lines): Benchmark job guide - references/benchmark-results.md (78 lines): Results download code - references/recommendation-workflow.md (96 lines): Recommendation guide - references/recommendation-options.md (74 lines): Config options + dataset - references/recommendation-deploy.md (41 lines): ModelPackage deployment - references/interpreting-results.md (78 lines): Metrics presentation All files conform to DESIGN_GUIDELINES.md limits (SKILL.md <300, references <100 lines each). Code samples verified against the public Smithy model.
|
Benchmark polling loop has no post-loop status check — failed jobs silently proceed to results download File: benchmark-workflow.md (polling block) The benchmark polling loop breaks on Failed but has no status gate afterward. The SKILL.md workflow proceeds to "download results," which will fail with a confusing error when the tar.gz doesn't Fix: Add post-loop status handling matching the recommendation workflow: |
|
Polling loops have no timeout — potential infinite hang Files: benchmark-workflow.md, recommendation-workflow.md Both while True loops have no max duration. If a job enters an unexpected non-terminal state, the notebook cell hangs indefinitely. Add a MAX_WAIT_SECONDS guard (e.g., 3600s for benchmark, 7200s for recommendation). |
|
No error handling on S3 download and tar extraction File: benchmark-results.md s3.get_object() and tarfile.open() have no try/except. Users with wrong IAM permissions see raw AccessDenied tracebacks. A corrupted archive gives an opaque ReadError. Add try/except with actionable messages. |
|
Pandas marked "optional" but unconditionally used File: benchmark-results.md, line 6 Comment says # optional, for tabular display but pd.DataFrame(...) is called unconditionally. Either remove the "optional" comment or add a fallback. |
|
sm client used without being defined in benchmark-results.md File: benchmark-results.md, line 12 The code uses sm.describe_ai_benchmark_job(...) but sm = boto3.client("sagemaker") is never created in this code block. Either add the client creation or a comment noting it comes from a prior cell. |
New skill covering the 14 SageMaker AI Optimization API operations (AIWorkloadConfig, AIBenchmarkJob, AIRecommendationJob) with guided workflows for benchmarking LLM inference and getting deployment recommendations.
Skill structure:
All files conform to DESIGN_GUIDELINES.md limits (SKILL.md <300, references <100 lines each). Code samples verified against the public Smithy model.
Related
Changes
Acknowledgment
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of the project license.