Skip to content

Merge Dev - Add LangSmith integration and bump cli version (0.3.17)#116

Merged
aliroberts merged 3 commits intomainfrom
dev
Mar 9, 2026
Merged

Merge Dev - Add LangSmith integration and bump cli version (0.3.17)#116
aliroberts merged 3 commits intomainfrom
dev

Conversation

@aliroberts
Copy link
Contributor

Summary

  • Add LangSmith as a pluggable eval backend, enabling Weco to optimize code against LangSmith datasets and evaluators
  • Add an interactive browser-based setup wizard that launches when LangSmith args are missing in a TTY session
  • Add dataset split support (--langsmith-splits) following LangSmith best practices
  • Bump version to 0.3.17, bump minimum Python to 3.9, remove unused fastapi/slowapi deps

LangSmith Integration

Eval Backend (weco/integrations/langsmith/backend.py)

Pluggable backend implementing register_args(), validate_args(), and build_eval_command(). Adds 12 --langsmith-* CLI flags covering dataset, target, evaluators, splits, adapters, dashboard evaluators, and custom metric functions.

Evaluation Bridge (weco/integrations/langsmith/bridge.py)

Subprocess runner that dynamically imports user code, resolves evaluators (custom module:function specs, LangSmith built-ins, or local evaluators.py), runs client.evaluate(), polls for async dashboard evaluator scores, and prints
metrics in Weco's key: value format. Supports target adapters for LangChain runnables and single-input functions.

Setup Wizard (weco/integrations/langsmith/wizard/)

Browser-based single-page app served from a local HTTP server. Guides users through API key setup, dataset selection (with split picker), source file selection, target function discovery (AST-based), and evaluator configuration. Mutates the
CLI args namespace on submit.

Dataset Splits

All layers support --langsmith-splits to filter evaluation to specific dataset splits (e.g. opt, holdout) instead of requiring separate datasets per split. The wizard fetches available splits and shows a chip picker.

CLI Changes (weco/cli.py)

  • Added --eval-backend flag with backend registry (_EVAL_BACKENDS, _load_backend())
  • Made --source, --eval-command, --metric, --goal non-required when using a backend
  • Backend args registered dynamically via backend.register_args()

Example

examples/langsmith-zeph-hr-qa/ — HR QA agent optimized against a LangSmith dataset with custom evaluators, dashboard LLM judges, and a gated metric function. Uses dataset splits for train/holdout separation.

@aliroberts aliroberts force-pushed the dev branch 3 times, most recently from 7b1e198 to 8b4155b Compare March 9, 2026 12:39
@aliroberts aliroberts merged commit 323ed2e into main Mar 9, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant