From 2c947e0d142dded6b2a35462450f417167740854 Mon Sep 17 00:00:00 2001 From: Paul Yuk Date: Fri, 29 May 2026 08:21:51 -0700 Subject: [PATCH] fix(skill/sandboxes): routing, disambiguation, portal, ask-first-on-ambiguous Addresses 4 routing/selection bug clusters surfaced by the ACA Sandboxes Vally eval suite (Run 3, see coreai-microsoft/adc-devx#214 baselines): 1. Dynamic Sessions disambiguation (CRITICAL). Skill currently activates and recommends ACA Sandboxes when the user asks for code-interpreter / untrusted-code-from-LLM-agent / ephemeral-seconds workloads. Adds an explicit 'Do NOT activate for' block listing Container Apps Dynamic Sessions with the cues (code interpreter, execute LLM-generated code, untrusted code from my agent, session pool, ephemeral seconds), and a side-by-side comparison table making the lifetime / isolation / audience differences explicit. 2. Over-triggering on bare 'sandbox' / 'microVM' / 'VM' keywords. Currently SKILL.md trigger list includes broad terms like 'create sandbox', 'microVM', 'code interpreter' which fire on AKS sandbox namespaces, Cosmos sandbox containers, Windows Sandbox, etc. Tightens trigger requirements to include ACA / dev-loop / microVM-in-Azure context, removes 'code interpreter' as a hard trigger (relocated to Dynamic Sessions guidance), and adds explicit 'Do NOT activate for' entries for the common false positives. 3. Ask-first-on-ambiguous-prompts. Currently the skill jumps to provisioning on prompts like 'sandbox' / 'I need an ephemeral VM' / 'set up a sandbox for my coding agent' / 'what should I use to run my AI agent'. Adds a 'When the prompt is ambiguous, ASK ONE clarifying question before provisioning' branch in the description with 3 concrete patterns (single-word / lifetime-ambiguous / agent-runtime-ambiguous) and a target-product enumeration for each. 4. Portal capability correction. Skill currently says nothing about the portal at https://containerapps.azure.com/sandbox-groups, leading the model to claim 'the portal has no sandbox management surface' which is the opposite of the truth. Adds a Portal row to the Get-started table and references it in the comparison table. Also: - Auth canonical entry point updated from 'az login' to 'aca auth login' (delegates to az login). Matches the install.md PR. - 'Code interpreter' relabeled in Scenarios as 'Developer code-interpreter loop' to distinguish from the managed Dynamic Sessions product. No references/ behaviour changes in this PR; canonical-command-shown-verbatim work (the 5th bug cluster) deferred to a later references/scenarios.md PR so this one stays reviewable. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- plugin/skills/sandboxes/SKILL.md | 73 ++++++++++++++++++++++++++------ 1 file changed, 61 insertions(+), 12 deletions(-) diff --git a/plugin/skills/sandboxes/SKILL.md b/plugin/skills/sandboxes/SKILL.md index 8e1f9d3..2823ce6 100644 --- a/plugin/skills/sandboxes/SKILL.md +++ b/plugin/skills/sandboxes/SKILL.md @@ -1,26 +1,56 @@ --- name: sandboxes description: | - Azure Container Apps sandboxes let you run untrusted code, agents, - MCP servers, and web apps in hardware-isolated microVMs. - Supports snapshot/resume, scale-to-zero, deny-default egress, and is - managed with `aca` CLI using `az login`. + Azure Container Apps Sandboxes (ACA Sandboxes) let a developer run + untrusted code, agents, MCP servers, and web apps in their OWN + hardware-isolated microVM. Long-lived (hours-to-days), developer-owned, + programmatically controlled via the `aca` CLI (which uses + `aca auth login`, delegating to `az login` under the hood). Use when the user wants to: create/manage sandbox groups and sandboxes; exec or open a shell; read/write files; expose ports; snapshot, stop, resume, commit to disk; mount volumes; tighten egress; manage secrets, identity, labels; apply YAML; or run - scenarios like web apps, coding agents, code interpreter, swarms, + scenarios like web apps, coding agents, agent swarms, computer-use, or MCP hosting. + **Do NOT activate for:** + - **Container Apps Dynamic Sessions** (the LLM "code interpreter" + product — ephemeral seconds-long sessions, Hyper-V isolation, used + for executing untrusted code GENERATED BY an LLM tool/agent at + runtime). Different product in the same family. If the user mentions + "code interpreter", "execute LLM-generated code", "untrusted code + from my agent", "session pool", "ephemeral seconds", point them at + Container Apps Dynamic Sessions instead. + - Kubernetes / AKS "sandbox" namespaces or pods. + - Cosmos DB "sandbox" containers / databases. + - Windows Sandbox, Linux namespace sandboxes, Salesforce Sandbox, + Playwright sandbox, browser sandboxes. + - Generic "VM in Azure" without ACA / microVM / dev-loop context + (could be Azure VM, Dev Box, Codespaces). + + **When the prompt is ambiguous, ASK ONE clarifying question before + provisioning anything.** Specifically: + - Single-word prompts ("sandbox", "microVM", "VM") — ask which product + family (ACA Sandboxes, Dynamic Sessions, Azure VM, Dev Box, …). + - "Ephemeral VM" / "I need a VM for testing" — ask expected lifetime + (seconds → Dynamic Sessions; hours-to-days → ACA Sandboxes; long-lived + workstation → Dev Box / Azure VM). + - "Set up a sandbox for my coding agent" / "What should I use to run + my AI agent?" — ask whether (a) the agent needs its OWN dev + environment (ACA Sandboxes is the strong fit) or (b) the agent needs + to execute end-user / LLM-generated code (Dynamic Sessions). + If `aca` is missing, read `references/install.md` first. `aca` ships ONLY via GitHub Releases (microsoft/azure-container-apps); not npm/pip/winget/brew. Don't guess. - Triggers: install aca, install aca cli, setup aca, aca doctor, aca - login, command not found: aca, create sandbox, sandbox group, aca - cli, aca sandbox, exec in sandbox, microVM, code interpreter, agent - swarm, host mcp. + Triggers (must have ACA / microVM / dev-loop context — bare keywords + alone are NOT enough; ask first per above): install aca, install aca + cli, setup aca, aca doctor, aca auth login, aca login, command not + found: aca, create ACA sandbox, sandbox group, aca cli, aca sandbox, + exec in ACA sandbox, ACA microVM, host mcp in sandbox, personal agent + sandbox. --- # Sandboxes @@ -38,7 +68,8 @@ folder. - **Startup:** sub-second from a prewarmed pool; suspend/resume preserves full memory + disk. - **Scale:** zero to thousands; pay nothing when idle. -- **Auth:** `aca` delegates to `az login` — same identity, same MFA. +- **Auth:** `aca auth login` — delegates to `az login` under the hood, + same identity, same MFA. > ⚠️ **The `az` CLI has no sandbox commands.** Sandbox groups and > sandboxes are managed by `aca` — **not** by `az containerapp …`. The @@ -46,6 +77,21 @@ folder. > do not touch sandboxes. If you see `az containerapp sandbox …` in a > snippet, it's wrong. +## ACA Sandboxes vs. adjacent products (don't confuse them) + +| | **ACA Sandboxes** | **Container Apps Dynamic Sessions** | **Regular Container App** | +|---|---|---|---| +| Audience | Developer / agent owner | LLM tool runtime | Production HTTP service | +| Lifetime | Hours → days, snapshot/resume | Seconds → minutes, ephemeral | Indefinite, auto-scale 0→N | +| Isolation | microVM (hardware) | Hyper-V (hardware, per-session) | Shared container runtime | +| Managed via | `aca` CLI, YAML manifests | Dynamic Sessions API / SDK | `az containerapp`, Bicep | +| Use it for | Personal dev env, MCP host, agent dev loop, swarms | LLM-generated untrusted code, code interpreter as a service | Web apps, APIs, jobs | +| Portal | `https://containerapps.azure.com/sandbox-groups` | Azure portal `Microsoft.App/sessionPools` | Azure portal `Microsoft.App/containerApps` | + +If the user's intent matches the middle column, **stop and point them +at Container Apps Dynamic Sessions** — don't try to make ACA Sandboxes +fit. + ## Get started | | Where | @@ -55,6 +101,7 @@ folder. | **Quick start** | [references/quickstart.md](references/quickstart.md) | | **Full CLI reference** | [references/reference.md](references/reference.md) | | **Scenario recipes** | [references/scenarios.md](references/scenarios.md) | +| **Portal** | [`https://containerapps.azure.com/sandbox-groups`](https://containerapps.azure.com/sandbox-groups) — exploration, visual inspection of groups/sandboxes/ports/logs. The CLI is primary; the portal complements it for one-off ops. | After install, always confirm setup with `aca doctor` — it resolves subscription / RG / group / region / role and tells you which check @@ -98,8 +145,10 @@ in [references/scenarios.md](references/scenarios.md). - **Web apps** — start a server, expose a port anonymously, hit the URL. - **Coding agents in a sandbox** — run Copilot CLI / Claude Code / Codex with deny-default egress and (optionally) token-swap rules. -- **Code interpreter** — LLM generates → exec → observe → iterate; - snapshot between turns for rewind. +- **Developer code-interpreter loop** — your OWN agent generates code, + execs it inside YOUR sandbox, snapshots between turns for rewind. + For LLM tools / managed code-interpreter-as-a-service used by end-user + agents at runtime, use **Container Apps Dynamic Sessions** instead. - **Swarms** — orchestrator fans work across N worker sandboxes by label selector. - **Sandbox inception** — orchestrator runs *inside* a sandbox and uses