JS/TS update prompts to not have connection string, add logger requirements, RestError updates#643
JS/TS update prompts to not have connection string, add logger requirements, RestError updates#643KarishmaGhiya wants to merge 4 commits into
Conversation
…stead of connection string Section 10 contradicted Sections 3 and 4 by showing connectionString pattern. Updated to use fully qualified namespace + DefaultAzureCredential, consistent with the skill's own guidance to never use connection strings. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…rompts @azure/logger was the ronniegeraghty#1 failing criteria (87% fail rate across all configs, including skills). Root cause: the skill instructs 'always set up SDK logging' but the model prioritizes the specific task in the prompt over general skill instructions. Only blob-storage-manager passed because it explicitly asked for 'SDK logging at a configurable level'. Added 'Enable SDK diagnostic logging using @azure/logger' to the Prompt section of all 13 other JS/TS prompts to make the requirement explicit. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add explicit instruction to handle errors using RestError from @azure/core-rest-pipeline with statusCode checks to all 12 JS/TS prompts that were missing it. Same fix pattern as the @azure/logger fix - additive SDK patterns need prompt-level instruction to be reliably generated. Analysis from Run 5: RestError failed at 68% overall (54/79 criteria). Even with skills (which teach RestError in Section 2), failure rate was 55-63%. Prompts that already mentioned RestError in their Prompt section (app-configuration, storage-crud) had 0% failure rate. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
| Show required npm packages (@azure/event-hubs and | ||
| @azure/eventhubs-checkpointstore-blob) and proper async/await patterns. | ||
| Enable SDK diagnostic logging using `@azure/logger` with a configurable log level. | ||
| Handle errors using `RestError` from `@azure/core-rest-pipeline` with `statusCode` checks. |
There was a problem hiding this comment.
Don't think this should be explicitly mentioned in the prompt, its give a bias?
There was a problem hiding this comment.
Our evaluation criteria is based on the usage of these libraries. Unless the agent is told to perform an action to add logging and Handle error it almost NEVER does it, even if it's part of the best practices skill. Please see the findings in Points 3 and 4 in the description above.
|
|
||
| Show required npm packages (@azure/event-hubs and | ||
| @azure/eventhubs-checkpointstore-blob) and proper async/await patterns. | ||
| Enable SDK diagnostic logging using `@azure/logger` with a configurable log level. |
There was a problem hiding this comment.
This too, if the aim of the PR is to make the prompts generic, it should just say "Enable SDK diagnostic logging"
|
|
||
| Use DefaultAzureCredential for authentication. Show required npm packages | ||
| and include proper error handling with try/catch. | ||
| Use a credential from `@azure/identity` for authentication. Enable SDK diagnostic |
There was a problem hiding this comment.
Why are we mentioning bias with specific packages if we don't want to mention specific credential types?
| The project needs: | ||
|
|
||
| - A **secret provider class** that retrieves secrets from Key Vault by name, with graceful handling when a secret doesn't exist (return a default value instead of crashing). It should also be able to retrieve a specific version of a secret (not just the latest), and inspect a secret's expiry date so the caller can tell if a secret is about to expire. | ||
| - A **secret provider class** that retrieves secrets from Key Vault by name, with graceful handling when a secret doesn't exist (return a default value instead of crashing) — use `RestError` from `@azure/core-rest-pipeline` with `statusCode` checks (e.g., 404) to detect not-found vs other failures. It should also be able to retrieve a specific version of a secret (not just the latest), and inspect a secret's expiry date so the caller can tell if a secret is about to expire. |
There was a problem hiding this comment.
This is updating the prompt from general scope to bias with sdk terminologies.
|
@samvaity Unless we specify a specific prompt to the LLM to add logging or parse Rest Error - it doesn't do it just based on best practices skill. It only does what is needed to make the code work. Atleast this is the gap I found in JS. The best practices skill is already being read by the LLM and it doesn't take any action with just that knowledge unless specifically told to add logging, error handling. |
Evaluation Results: SDK-Specific vs Generic Prompt WordingWe ran two experiments to determine how prompt wording affects whether the model uses the correct Azure SDK patterns for error handling ( Experiment Setup
Results
RestError Breakdown (Run 7 — generic wording)
Config-Level RestError (Run 7)
Key TakeawayGeneric wording ("handle errors by parsing the HTTP status code") is too vague — the model doesn't reliably translate that into the correct SDK pattern ( This confirms the instruction authority hierarchy: for additive SDK patterns (logging, error handling), the model needs explicit SDK construct names in the prompt. Skills help (baseline-skills scores higher than baseline) but aren't sufficient alone — the gap between skills-only (89.5%) and prompt-specific (100%) is significant. Recommendation (OPEN TO DISCUSSION)Keep SDK-specific terminology in the prompt sections. The prompts are language-specific (JS/TS) already, so mentioning |
Summary
Improves JS/TS evaluation prompts to enforce three critical Azure SDK patterns that were causing high failure rates in hyoka evaluations. These changes were validated across 3 evaluation runs (Run 4 → Run 5 → Run 6) with measurable improvements at each stage.
Changes
1. Remove connection strings from prompts — use
DefaultAzureCredential(all prompts)@azure/identitycredential-based auth in prompts that were still using connection strings (app-configuration, cosmos-db, event-hubs, service-bus, key-vault)2. Fix Event Hubs skill example — use
DefaultAzureCredential(SKILL.md)connectionStringusage while Sections 3-4 said "never use connection strings" — a direct contradictionDefaultAzureCredentialwith fully qualified namespace3. Add
@azure/loggerdiagnostic logging requirement (all 14 prompts)Enable SDK diagnostic logging using @azure/logger with a configurable log level4. Add
RestErrorexception handling requirement (12 prompts)@azure/core-rest-pipelinehad a 68% failure rate. Same pattern as logger — the skill teaches it (Section 2) but the model ignores additive requirementsHandle errors using RestError from @azure/core-rest-pipeline with statusCode checksEvaluation Results
Key Insight
Instruction authority hierarchy for LLMs:
For "additive" SDK patterns (code works fine without them), prompt-level instruction is required — skills alone are insufficient.
Files Changed
prompts/— credential, logger, and RestError updatesskills/generator/js-ts-azure-patterns/SKILL.md— Event Hubs example fixed to use DAC