Skip to content

JS/TS update prompts to not have connection string, add logger requirements, RestError updates#643

Open
KarishmaGhiya wants to merge 4 commits into
ronniegeraghty:mainfrom
KarishmaGhiya:prompt-updates
Open

JS/TS update prompts to not have connection string, add logger requirements, RestError updates#643
KarishmaGhiya wants to merge 4 commits into
ronniegeraghty:mainfrom
KarishmaGhiya:prompt-updates

Conversation

@KarishmaGhiya
Copy link
Copy Markdown
Contributor

@KarishmaGhiya KarishmaGhiya commented May 6, 2026

Summary

Improves JS/TS evaluation prompts to enforce three critical Azure SDK patterns that were causing high failure rates in hyoka evaluations. These changes were validated across 3 evaluation runs (Run 4 → Run 5 → Run 6) with measurable improvements at each stage.

Changes

1. Remove connection strings from prompts — use DefaultAzureCredential (all prompts)

  • Replaced connection string authentication with @azure/identity credential-based auth in prompts that were still using connection strings (app-configuration, cosmos-db, event-hubs, service-bus, key-vault)
  • Updated evaluation criteria to expect credential-based client construction

2. Fix Event Hubs skill example — use DefaultAzureCredential (SKILL.md)

  • Root cause: Section 10 of the generator skill showed connectionString usage while Sections 3-4 said "never use connection strings" — a direct contradiction
  • Fixed to use DefaultAzureCredential with fully qualified namespace
  • Updated both producer and consumer examples

3. Add @azure/logger diagnostic logging requirement (all 14 prompts)

  • Root cause: @azure/logger had an 87% failure rate across ALL configs (including skills). The skill teaches it (Section 1), but it's an "additive" pattern — code works without it, so the model ignores skill instructions
  • Fix: Added explicit prompt-level instruction: Enable SDK diagnostic logging using @azure/logger with a configurable log level
  • Result (Run 5): 87% fail → 0% fail (complete fix)

4. Add RestError exception handling requirement (12 prompts)

  • Root cause: RestError from @azure/core-rest-pipeline had a 68% failure rate. Same pattern as logger — the skill teaches it (Section 2) but the model ignores additive requirements
  • Fix: Added explicit prompt-level instruction: Handle errors using RestError from @azure/core-rest-pipeline with statusCode checks
  • Result (Run 6): 68% fail → 0% fail (complete fix)

Evaluation Results

Metric Run 4 (before) Run 5 (+logger fix) Run 6 (+RestError fix)
Overall pass rate 84.0% 86.5% 94.4%
@azure/logger 13% pass 100% pass 100% pass
RestError 32% pass 32% pass 100% pass
Best config 87.3% 89.9% 96.3%

Key Insight

Instruction authority hierarchy for LLMs:

  1. Prompt task instructions (highest priority) — model reliably follows these
  2. Skill directives — followed for structural patterns (auth), ignored for additive patterns (logging, error handling)
  3. Retrieved docs (MCP) — informational, not directive
  4. Training data prior (lowest) — default behavior

For "additive" SDK patterns (code works fine without them), prompt-level instruction is required — skills alone are insufficient.

Files Changed

  • 14 JS/TS prompt files under prompts/ — credential, logger, and RestError updates
  • skills/generator/js-ts-azure-patterns/SKILL.md — Event Hubs example fixed to use DAC

KarishmaGhiya and others added 4 commits May 6, 2026 13:47
…stead of connection string

Section 10 contradicted Sections 3 and 4 by showing connectionString pattern.
Updated to use fully qualified namespace + DefaultAzureCredential, consistent
with the skill's own guidance to never use connection strings.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…rompts

@azure/logger was the ronniegeraghty#1 failing criteria (87% fail rate across all configs,
including skills). Root cause: the skill instructs 'always set up SDK logging'
but the model prioritizes the specific task in the prompt over general skill
instructions. Only blob-storage-manager passed because it explicitly asked
for 'SDK logging at a configurable level'.

Added 'Enable SDK diagnostic logging using @azure/logger' to the Prompt
section of all 13 other JS/TS prompts to make the requirement explicit.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add explicit instruction to handle errors using RestError from
@azure/core-rest-pipeline with statusCode checks to all 12 JS/TS
prompts that were missing it. Same fix pattern as the @azure/logger
fix - additive SDK patterns need prompt-level instruction to be
reliably generated.

Analysis from Run 5: RestError failed at 68% overall (54/79 criteria).
Even with skills (which teach RestError in Section 2), failure rate
was 55-63%. Prompts that already mentioned RestError in their Prompt
section (app-configuration, storage-crud) had 0% failure rate.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@KarishmaGhiya KarishmaGhiya marked this pull request as ready for review May 11, 2026 22:10
@KarishmaGhiya KarishmaGhiya changed the title J/TS update prompts to not have connection string JS/TS update prompts to not have connection string, add logger requirements, RestError updates May 11, 2026
Show required npm packages (@azure/event-hubs and
@azure/eventhubs-checkpointstore-blob) and proper async/await patterns.
Enable SDK diagnostic logging using `@azure/logger` with a configurable log level.
Handle errors using `RestError` from `@azure/core-rest-pipeline` with `statusCode` checks.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think this should be explicitly mentioned in the prompt, its give a bias?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our evaluation criteria is based on the usage of these libraries. Unless the agent is told to perform an action to add logging and Handle error it almost NEVER does it, even if it's part of the best practices skill. Please see the findings in Points 3 and 4 in the description above.


Show required npm packages (@azure/event-hubs and
@azure/eventhubs-checkpointstore-blob) and proper async/await patterns.
Enable SDK diagnostic logging using `@azure/logger` with a configurable log level.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This too, if the aim of the PR is to make the prompts generic, it should just say "Enable SDK diagnostic logging"


Use DefaultAzureCredential for authentication. Show required npm packages
and include proper error handling with try/catch.
Use a credential from `@azure/identity` for authentication. Enable SDK diagnostic
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we mentioning bias with specific packages if we don't want to mention specific credential types?

The project needs:

- A **secret provider class** that retrieves secrets from Key Vault by name, with graceful handling when a secret doesn't exist (return a default value instead of crashing). It should also be able to retrieve a specific version of a secret (not just the latest), and inspect a secret's expiry date so the caller can tell if a secret is about to expire.
- A **secret provider class** that retrieves secrets from Key Vault by name, with graceful handling when a secret doesn't exist (return a default value instead of crashing) — use `RestError` from `@azure/core-rest-pipeline` with `statusCode` checks (e.g., 404) to detect not-found vs other failures. It should also be able to retrieve a specific version of a secret (not just the latest), and inspect a secret's expiry date so the caller can tell if a secret is about to expire.
Copy link
Copy Markdown
Collaborator

@samvaity samvaity May 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is updating the prompt from general scope to bias with sdk terminologies.

@KarishmaGhiya
Copy link
Copy Markdown
Contributor Author

@samvaity Unless we specify a specific prompt to the LLM to add logging or parse Rest Error - it doesn't do it just based on best practices skill. It only does what is needed to make the code work. Atleast this is the gap I found in JS. The best practices skill is already being read by the LLM and it doesn't take any action with just that knowledge unless specifically told to add logging, error handling.

@KarishmaGhiya
Copy link
Copy Markdown
Contributor Author

KarishmaGhiya commented May 12, 2026

Evaluation Results: SDK-Specific vs Generic Prompt Wording

We ran two experiments to determine how prompt wording affects whether the model uses the correct Azure SDK patterns for error handling (RestError) and logging (@azure/logger).

Experiment Setup

  • 56 evaluations per run (14 JS/TS prompts × 4 sonnet configs: baseline, baseline-skills, azure-mcp, azure-mcp-skills)
  • Run 6 — SDK-specific wording in prompts (e.g., "Handle errors using RestError from @azure/core-rest-pipeline with statusCode checks", "Enable SDK diagnostic logging using @azure/logger with a configurable log level")
  • Run 7 — Generic wording (e.g., "Handle errors by parsing the HTTP status code", "Enable SDK diagnostic logging with a configurable log level")

Results

Metric Run 6 (SDK-specific) Run 7 (generic) Delta
Overall pass rate 94.4% 89.4% -5.0%
RestError criteria 100% (80/80) 62.5% (50/80) -37.5%
Logger criteria ~100% 92.1% (58/63) -7.9%

RestError Breakdown (Run 7 — generic wording)

Prompt Pass Rate
event-hubs 0% (0/4)
app-configuration 25% (2/8)
cosmos-db 25% (1/4)
service-bus 25% (1/4)
identity-service-principal 50% (2/4)
storage-blob-manager 50% (2/4)
encrypted-uploader 64% (9/14)
key-vault-crud 67% (4/6)
resource-manager 75% (3/4)
key-vault-secret-config 75% (6/8)
storage-account-mgmt 100% (4/4)
identity-default-credential 100% (4/4)
identity-managed-identity 100% (4/4)
storage-crud 100% (8/8)

Config-Level RestError (Run 7)

Config Pass Rate
baseline 36.4% (8/22)
azure-mcp 55.0% (11/20)
azure-mcp-skills 73.7% (14/19)
baseline-skills 89.5% (17/19)

Key Takeaway

Generic wording ("handle errors by parsing the HTTP status code") is too vague — the model doesn't reliably translate that into the correct SDK pattern (RestError import + statusCode check). The SDK-specific wording produced 100% pass rates for both RestError and logger criteria.

This confirms the instruction authority hierarchy: for additive SDK patterns (logging, error handling), the model needs explicit SDK construct names in the prompt. Skills help (baseline-skills scores higher than baseline) but aren't sufficient alone — the gap between skills-only (89.5%) and prompt-specific (100%) is significant.

Recommendation (OPEN TO DISCUSSION)

Keep SDK-specific terminology in the prompt sections. The prompts are language-specific (JS/TS) already, so mentioning RestError and @azure/logger is consistent with the level of specificity expected. The evaluation criteria sections already use SDK-specific terms — the prompt section should match.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants