fix: resolve all mypy type errors across 6 source files by Dongbumlee · Pull Request #71 · Azure/agentops

Dongbumlee · 2026-04-14T00:26:38Z

Resolves all 37 mypy type errors across the codebase.

Files changed:

\oundry_backend.py\ — assert narrowing for Optional[str], Dict type widening
\config_loader.py\ — added BaseModel import and TypeVar bound
eporter.py\ — removed conflicting annotations, renamed shadowed loop vars
\�rowse.py\ — split Path | None annotation into separate assignment
\comparison.py\ — fixed _compute_metric_direction return type, renamed loop vars
unner.py\ — added imports, Pydantic model constructors

Validation:

mypy: 0 errors (32 source files)
Unit tests: 112 passed, 1 skipped
E2E integration: 3 passed

Derive package.json version at CI time from the latest git tag using git describe + jq. Mimics setuptools-scm patch-increment behavior: - On exact tag (release): use tag version directly (e.g. v0.2.0 -> 0.2.0) - Off tag (develop/PR): increment patch (e.g. v0.1.0 + commits -> 0.1.1) Applied to all 4 VSIX jobs: - ci.yml: build-vsix, publish-vsix-dev - staging.yml: publish-vsix-prerelease - release.yml: publish-vsix Also adds fetch-depth: 0 to checkout steps so git describe has access to the full tag history.

…Usage examples

- foundry_backend.py: assert narrowing for Optional[str], Dict type widening - config_loader.py: added BaseModel import and TypeVar bound - reporter.py: removed conflicting annotations, renamed shadowed loop vars - browse.py: split Path | None annotation into separate assignment - comparison.py: fixed _compute_metric_direction return type, renamed loop vars - runner.py: added imports, Pydantic model constructors

* feat: add OTLP tracing foundation for evaluation runs - Add utils/telemetry.py with lazy OTel imports and span context managers - Instrument runner.py with three-layer schema (CICD + GenAI + agentops.eval) - Root span per eval run, item spans per row, evaluator child spans - Activated via AGENTOPS_OTLP_ENDPOINT env var (opt-in, zero overhead) - Graceful no-op when opentelemetry-sdk is not installed - 16 unit tests covering disabled, degraded, and enabled states Refs: #14 * docs: add OTLP telemetry to AGENTS.md and copilot-instructions * feat: extend Foundry cloud evaluator coverage to 22 built-in evaluators (#51) - Expand evaluator frozensets: add response_completeness, groundedness_pro, retrieval, tool_selection to existing sets - Add new frozensets: _EVALUATORS_NEEDING_TOOL_DEFS_ONLY (tool_input_accuracy, tool_output_utilization, tool_call_success), _EVALUATORS_NEEDING_OUTPUT_ITEMS (task_adherence) - Fix NLP evaluator names (bleu_score, rouge_score, etc.) to match _to_builtin_evaluator_name conversion - Add default initialization_parameters for RougeScoreEvaluator (rouge_type) - Build item_schema dynamically: include tool_definitions and context_field when evaluators need them - Refactor _default_foundry_input_mapping to frozenset-based routing - Improve error handling: log evaluator errors when score is null, improve runner error message with --verbose hint - Add CI/CD integration models documentation: PR gate, scheduled, post-deploy, multi-env promotion, Azure DevOps pipeline - Add gating best practices: threshold design, evaluator selection by scenario - Add supported evaluators reference table (22 evaluators by category) - Add ~20 unit tests for all new evaluator data_mapping patterns - All 22 evaluators verified end-to-end with live Foundry cloud evaluation Closes #51 * fix: skip telemetry tests when opentelemetry is not installed TestSpanAttributesWhenEnabled requires opentelemetry to be installed because the code paths import SpanKind/StatusCode when tracing is enabled. Use pytest.importorskip to skip the class in CI where opentelemetry is not a declared dependency. * docs: align all documentation with current implementation - Fix skill paths: plugins/agentops/skills/ (not .github/plugins/) across README, tutorial-copilot-skills (6 instances) - Fix CLI contract: add eval compare and config cicd as implemented commands in AGENTS.md, copilot-instructions.md, how-it-works.md - Fix source tree listings: add cicd.py, comparison.py, telemetry.py, workflows/ across AGENTS.md, how-it-works.md - Fix test listings: add test_cicd, test_cli_commands, test_comparison, test_telemetry across AGENTS.md, copilot-instructions.md, how-it-works.md - Fix agent_tools_baseline: TaskCompletionEvaluator + ToolCallAccuracyEvaluator (not SimilarityEvaluator placeholder) in README, AGENTS.md, how-it-works.md - Fix JSONL path: data/<name>.jsonl (not datasets/) in ci-github-actions.md - Fix init flag: --dir (not --path) in README - Fix evaluator guidance: add frozenset names and NLP_DEFAULT_INIT_PARAMS to copilot-instructions.md - Add context_field to dataset format docs in AGENTS.md - Add rouge_type default note to evaluator reference doc - Update planned command message to list all 5 available commands - Add --format flag to CLI usage examples * feat: implement bundle list/show and run list/show commands - Add services/browse.py with list_bundles, show_bundle, list_runs, show_run - Replace planned stubs with working implementations in cli/app.py - bundle list: shows all bundles with evaluators and threshold count - bundle show: displays full bundle detail (evaluators, thresholds, metadata) - run list: shows all past runs with status, bundle, dataset, duration - run show: displays full run detail (metrics, thresholds, items, Foundry URL) - Add 16 unit tests (service + CLI) in test_browse.py - All commands are read-only, no side effects, no Azure API calls * refactor: split CLI into command modules Split app.py (487 lines) into focused command modules: - app.py (114 lines) — root app, global callback, init, sub-app registration - eval_commands.py (108 lines) — eval run, eval compare - report_commands.py (66 lines) — report, report show/export stubs - browse_commands.py (152 lines) — bundle list/show, run list/show/view - config_commands.py (56 lines) — config cicd, config validate/show stubs - planned.py (57 lines) — dataset, monitor, trace, model, agent stubs - _planned.py (12 lines) — shared planned command helper No behavior changes. All 96 tests pass. * refactor: remove planned.py, move stubs to their command files - Move dataset stubs to dataset_commands.py (ready for Tier 2 implementation) - Inline monitor/trace/model/agent stubs in app.py (1-2 commands each) - Delete planned.py — no more catch-all stub file * fix: remove duplicate _planned_command definition (ruff F811) * feat(skills): add 3 new skills for full CLI coverage Add agentops-workspace-setup, agentops-browse-inspect, and agentops-dataset-management skills covering all remaining CLI commands not handled by existing evaluation-focused skills. - agentops-workspace-setup: init, config cicd, config validate/show - agentops-browse-inspect: bundle list/show, run list/show/view - agentops-dataset-management: dataset creation, YAML/JSONL format, field mapping, planned validate/describe/import commands * feat(skills): add active workspace guard clauses to all downstream skills Add '## Before You Start' section to 5 downstream skills enforcing workspace verification before proceeding: - agentops-run-evals - agentops-investigate-regression - agentops-observability-triage - agentops-browse-inspect - agentops-dataset-management Each skill now instructs the agent to check for .agentops/ directory and redirect to agentops-workspace-setup skill if missing. This provides soft enforcement at the skill layer, complementing the hard CLI enforcement (FileNotFoundError) already in place. * feat(skills): add coverage for report show/export, model list, agent list planned commands * fix: remove duplicate _planned_command definition (ruff F811) * style: apply ruff-format to comparison.py and test_cli_commands.py * ci: integrate VSIX packaging with pre-release into CI/CD pipeline - ci.yml: add build-vsix validation job (package only, no publish) - staging.yml: add publish-vsix-prerelease job (vsce publish --pre-release) - release.yml: add publish-vsix stable job + attach VSIX to GitHub Release - cut-release.yml: sync package.json version via jq, update PR body/checklist - _build.yml: update header comments (Python-only, no VSIX logic) - plugins/agentops: add README.md, CHANGELOG.md, .vscodeignore, package.json scripts Requires VSCE_PAT secret in staging and release GitHub environments. * ci(vsix): add LICENSE to plugin package * ci(vsix): set publisher to AgentOpsToolkit and fix package name * ci(vsix): upload VSIX artifact from CI and staging pipelines (#69) * ci(vsix): upload VSIX artifact from CI and staging pipelines * ci: publish VSIX pre-release to Marketplace on develop pushes Add publish-vsix-dev job to ci.yml that publishes the VSIX as a pre-release to the VS Code Marketplace on every push to develop, mirroring the publish-dev job that pushes to TestPyPI. - Gated on push to develop only (not PRs) - Depends on lint, test, and build-vsix jobs - Uses staging environment (VSCE_PAT secret) - Packages with --pre-release flag - Includes step summary with Marketplace link * ci(vsix): sync VSIX version from git tags in all pipelines (#70) * ci(vsix): sync VSIX version from git tags in all pipelines Derive package.json version at CI time from the latest git tag using git describe + jq. Mimics setuptools-scm patch-increment behavior: - On exact tag (release): use tag version directly (e.g. v0.2.0 -> 0.2.0) - Off tag (develop/PR): increment patch (e.g. v0.1.0 + commits -> 0.1.1) Applied to all 4 VSIX jobs: - ci.yml: build-vsix, publish-vsix-dev - staging.yml: publish-vsix-prerelease - release.yml: publish-vsix Also adds fetch-depth: 0 to checkout steps so git describe has access to the full tag history. * fix(vsix): update Marketplace link placeholder in README * docs(vsix): improve README — remove misleading Prerequisites, expand Usage examples * docs(vsix): remove CLI install note — skills handle setup automatically * fix: resolve all mypy type errors across 6 source files (#71) * ci(vsix): sync VSIX version from git tags in all pipelines Derive package.json version at CI time from the latest git tag using git describe + jq. Mimics setuptools-scm patch-increment behavior: - On exact tag (release): use tag version directly (e.g. v0.2.0 -> 0.2.0) - Off tag (develop/PR): increment patch (e.g. v0.1.0 + commits -> 0.1.1) Applied to all 4 VSIX jobs: - ci.yml: build-vsix, publish-vsix-dev - staging.yml: publish-vsix-prerelease - release.yml: publish-vsix Also adds fetch-depth: 0 to checkout steps so git describe has access to the full tag history. * fix(vsix): update Marketplace link placeholder in README * docs(vsix): improve README — remove misleading Prerequisites, expand Usage examples * docs(vsix): remove CLI install note — skills handle setup automatically * fix: resolve all mypy type errors across 6 source files - foundry_backend.py: assert narrowing for Optional[str], Dict type widening - config_loader.py: added BaseModel import and TypeVar bound - reporter.py: removed conflicting annotations, renamed shadowed loop vars - browse.py: split Path | None annotation into separate assignment - comparison.py: fixed _compute_metric_direction return type, renamed loop vars - runner.py: added imports, Pydantic model constructors * docs: add CHANGELOG entries for mypy fixes and VSIX pipeline * chore: prepare release 0.1.4 * fix: use global tag sort for VSIX version derivation Replace git describe --tags --abbrev=0 with git tag -l --sort=-v:refname to find the latest tag across ALL branches, not just reachable ones. Root cause: v0.1.3 tag on main was not reachable from develop, so git describe found v0.1.2 and derived version 0.1.3, which already existed on the Marketplace. Also adds continue-on-error on dev/staging VSIX publish steps as a safety net against 'already exists' errors. --------- Co-authored-by: Paulo Lacerda <pclacerda@gmail.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Dongbumlee added 5 commits April 13, 2026 16:21

fix(vsix): update Marketplace link placeholder in README

17f13c1

docs(vsix): improve README — remove misleading Prerequisites, expand …

68f646d

…Usage examples

docs(vsix): remove CLI install note — skills handle setup automatically

01acc55

Dongbumlee merged commit b48765a into develop Apr 14, 2026
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: resolve all mypy type errors across 6 source files#71

fix: resolve all mypy type errors across 6 source files#71
Dongbumlee merged 5 commits into
developfrom
feature/skill-vsix-cicd

Dongbumlee commented Apr 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Dongbumlee commented Apr 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant