Skip to content

feat: Add ai-optimization skill for SageMaker AI Optimization APIs#147

Open
Lokiiiiii wants to merge 2 commits intoawslabs:mainfrom
Lokiiiiii:ai-optimization
Open

feat: Add ai-optimization skill for SageMaker AI Optimization APIs#147
Lokiiiiii wants to merge 2 commits intoawslabs:mainfrom
Lokiiiiii:ai-optimization

Conversation

@Lokiiiiii
Copy link
Copy Markdown

New skill covering the 14 SageMaker AI Optimization API operations (AIWorkloadConfig, AIBenchmarkJob, AIRecommendationJob) with guided workflows for benchmarking LLM inference and getting deployment recommendations.

Skill structure:

  • SKILL.md (110 lines): Main skill with intent-matching description
  • references/benchmark-workflow.md (96 lines): Benchmark job guide
  • references/benchmark-results.md (78 lines): Results download code
  • references/recommendation-workflow.md (96 lines): Recommendation guide
  • references/recommendation-options.md (74 lines): Config options + dataset
  • references/recommendation-deploy.md (41 lines): ModelPackage deployment
  • references/interpreting-results.md (78 lines): Metrics presentation

All files conform to DESIGN_GUIDELINES.md limits (SKILL.md <300, references <100 lines each). Code samples verified against the public Smithy model.

Related

Changes

Acknowledgment

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of the project license.

New skill covering the 14 SageMaker AI Optimization API operations
(AIWorkloadConfig, AIBenchmarkJob, AIRecommendationJob) with guided
workflows for benchmarking LLM inference and getting deployment
recommendations.

Skill structure:
- SKILL.md (110 lines): Main skill with intent-matching description
- references/benchmark-workflow.md (96 lines): Benchmark job guide
- references/benchmark-results.md (78 lines): Results download code
- references/recommendation-workflow.md (96 lines): Recommendation guide
- references/recommendation-options.md (74 lines): Config options + dataset
- references/recommendation-deploy.md (41 lines): ModelPackage deployment
- references/interpreting-results.md (78 lines): Metrics presentation

All files conform to DESIGN_GUIDELINES.md limits (SKILL.md <300,
references <100 lines each). Code samples verified against the
public Smithy model.
@Lokiiiiii Lokiiiiii requested review from a team as code owners April 24, 2026 17:22
Comment thread plugins/sagemaker-ai/skills/ai-optimization/references/recommendation-options.md Dismissed
Comment thread plugins/sagemaker-ai/skills/ai-optimization/references/recommendation-options.md Dismissed
@Lokiiiiii Lokiiiiii changed the title Add ai-optimization skill for SageMaker AI Optimization APIs feat: Add ai-optimization skill for SageMaker AI Optimization APIs Apr 24, 2026
@krokoko
Copy link
Copy Markdown
Contributor

krokoko commented Apr 26, 2026

Benchmark polling loop has no post-loop status check — failed jobs silently proceed to results download

File: benchmark-workflow.md (polling block)

The benchmark polling loop breaks on Failed but has no status gate afterward. The SKILL.md workflow proceeds to "download results," which will fail with a confusing error when the tar.gz doesn't
exist. The recommendation workflow does handle this correctly — this is an asymmetry.

Fix: Add post-loop status handling matching the recommendation workflow:
if status == "Failed":
print(f"Benchmark failed: {resp.get('FailureReason', 'Unknown')}")
print("Fix the issue above and re-run the job.")
elif status == "Stopped":
print("Benchmark was stopped before completion.")
else:
print("Benchmark completed. Proceed to download results.")

@krokoko
Copy link
Copy Markdown
Contributor

krokoko commented Apr 26, 2026

Polling loops have no timeout — potential infinite hang

Files: benchmark-workflow.md, recommendation-workflow.md

Both while True loops have no max duration. If a job enters an unexpected non-terminal state, the notebook cell hangs indefinitely. Add a MAX_WAIT_SECONDS guard (e.g., 3600s for benchmark, 7200s for recommendation).

@krokoko
Copy link
Copy Markdown
Contributor

krokoko commented Apr 26, 2026

No error handling on S3 download and tar extraction

File: benchmark-results.md

s3.get_object() and tarfile.open() have no try/except. Users with wrong IAM permissions see raw AccessDenied tracebacks. A corrupted archive gives an opaque ReadError. Add try/except with actionable messages.

@krokoko
Copy link
Copy Markdown
Contributor

krokoko commented Apr 26, 2026

Pandas marked "optional" but unconditionally used

File: benchmark-results.md, line 6

Comment says # optional, for tabular display but pd.DataFrame(...) is called unconditionally. Either remove the "optional" comment or add a fallback.

@krokoko
Copy link
Copy Markdown
Contributor

krokoko commented Apr 26, 2026

sm client used without being defined in benchmark-results.md

File: benchmark-results.md, line 12

The code uses sm.describe_ai_benchmark_job(...) but sm = boto3.client("sagemaker") is never created in this code block. Either add the client creation or a comment noting it comes from a prior cell.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants