Skip to content

fix(skill/sandboxes): routing, disambiguation, portal, ask-first-on-ambiguous#1733

Open
paulyuk wants to merge 1 commit into
mainfrom
paulyuk/skill-routing-and-disambiguation
Open

fix(skill/sandboxes): routing, disambiguation, portal, ask-first-on-ambiguous#1733
paulyuk wants to merge 1 commit into
mainfrom
paulyuk/skill-routing-and-disambiguation

Conversation

@paulyuk
Copy link
Copy Markdown
Member

@paulyuk paulyuk commented May 29, 2026

Addresses 4 routing / selection bug clusters in plugin/skills/sandboxes/SKILL.md surfaced by the ACA Sandboxes Vally eval suite (Run 3, see coreai-microsoft/adc-devx#214 baselines).

What changes in SKILL.md

1. Dynamic Sessions disambiguation (CRITICAL)

Adds an explicit Do NOT activate for section listing Container Apps Dynamic Sessions with the cues (code interpreter, execute LLM-generated code, untrusted code from my agent, session pool, ephemeral seconds). Adds a side-by-side comparison table (ACA Sandboxes vs Dynamic Sessions vs Container App) making the lifetime / isolation / audience differences explicit. Today the skill activates and recommends ACA Sandboxes for these workloads, which is wrong.

2. Over-triggering on bare keywords

Tightens the Triggers list: bare sandbox / microVM / VM no longer match; trigger phrases now require ACA / dev-loop / microVM-in-Azure context (create ACA sandbox, ACA microVM, etc.). Adds Do NOT activate for entries covering the common false positives: AKS / Kubernetes sandbox namespaces, Cosmos DB sandbox containers, Windows Sandbox, Linux namespace sandboxes, Salesforce/Playwright sandboxes, generic Azure VM requests.

3. Ask-first-on-ambiguous-prompts branch

Adds a When the prompt is ambiguous, ASK ONE clarifying question before provisioning section in the description with 3 concrete patterns:

  • Single-word prompts (sandbox, microVM, VM) → ask which product family.
  • Lifetime-ambiguous (ephemeral VM, VM for testing) → ask expected lifetime; seconds → Dynamic Sessions, hours\u2013days → ACA Sandboxes, long-lived workstation → Dev Box / Azure VM.
  • Agent-runtime-ambiguous (set up a sandbox for my coding agent, what should I use to run my AI agent?) → ask whether the agent needs its OWN dev env (ACA Sandboxes) or to execute end-user/LLM-generated code (Dynamic Sessions).

Today the skill jumps straight to provisioning on these prompts \u2014 the eval suite caught this on all 6 syn-* stimuli.

4. Portal capability correction

Adds a Portal row to the Get-started table pointing at https://containerapps.azure.com/sandbox-groups, plus a Portal column in the comparison table. Today the skill claims "the Azure portal has no sandbox management surface", which is the opposite of the truth and contradicts the docs.

Bonus

Eval coverage

When this lands, expected improvements in the Vally suite:

  • neg-dynamic-session-query (CRITICAL) \u2014 should pass (skill no longer activates)
  • compare-sandbox-vs-dynamic-session (CRITICAL) \u2014 should pass (skill activates with table)
  • compare-aca-vs-portal \u2014 should pass (skill correctly cites portal)
  • neg-list-k8s-pods, neg-cosmos-query \u2014 should pass (skill no longer over-triggers)
  • syn-ai-agent-runtime, syn-coding-agent-sandbox, syn-ephemeral-vm, syn-microvm-vague, syn-sandbox-single-word, syn-vm-for-dev \u2014 should pass (skill asks clarifying question first)

Out of scope (separate PR)

  • The 5th eval-surfaced bug cluster (pos-* stimuli where the skill executes commands without showing the canonical aca \u2026 form first) deferred to a later references/scenarios.md PR so this one stays reviewable.

Related

Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com

…mbiguous

Addresses 4 routing/selection bug clusters surfaced by the ACA Sandboxes Vally eval suite (Run 3, see coreai-microsoft/adc-devx#214 baselines):

1. Dynamic Sessions disambiguation (CRITICAL). Skill currently activates and recommends ACA Sandboxes when the user asks for code-interpreter / untrusted-code-from-LLM-agent / ephemeral-seconds workloads. Adds an explicit 'Do NOT activate for' block listing Container Apps Dynamic Sessions with the cues (code interpreter, execute LLM-generated code, untrusted code from my agent, session pool, ephemeral seconds), and a side-by-side comparison table making the lifetime / isolation / audience differences explicit.

2. Over-triggering on bare 'sandbox' / 'microVM' / 'VM' keywords. Currently SKILL.md trigger list includes broad terms like 'create sandbox', 'microVM', 'code interpreter' which fire on AKS sandbox namespaces, Cosmos sandbox containers, Windows Sandbox, etc. Tightens trigger requirements to include ACA / dev-loop / microVM-in-Azure context, removes 'code interpreter' as a hard trigger (relocated to Dynamic Sessions guidance), and adds explicit 'Do NOT activate for' entries for the common false positives.

3. Ask-first-on-ambiguous-prompts. Currently the skill jumps to provisioning on prompts like 'sandbox' / 'I need an ephemeral VM' / 'set up a sandbox for my coding agent' / 'what should I use to run my AI agent'. Adds a 'When the prompt is ambiguous, ASK ONE clarifying question before provisioning' branch in the description with 3 concrete patterns (single-word / lifetime-ambiguous / agent-runtime-ambiguous) and a target-product enumeration for each.

4. Portal capability correction. Skill currently says nothing about the portal at https://containerapps.azure.com/sandbox-groups, leading the model to claim 'the portal has no sandbox management surface' which is the opposite of the truth. Adds a Portal row to the Get-started table and references it in the comparison table.

Also:
- Auth canonical entry point updated from 'az login' to 'aca auth login' (delegates to az login). Matches the install.md PR.
- 'Code interpreter' relabeled in Scenarios as 'Developer code-interpreter loop' to distinguish from the managed Dynamic Sessions product.

No references/ behaviour changes in this PR; canonical-command-shown-verbatim work (the 5th bug cluster) deferred to a later references/scenarios.md PR so this one stays reviewable.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant