Merge Dev - Add LangSmith integration and bump cli version (0.3.17)#116
Merged
aliroberts merged 3 commits intomainfrom Mar 9, 2026
Merged
Merge Dev - Add LangSmith integration and bump cli version (0.3.17)#116aliroberts merged 3 commits intomainfrom
aliroberts merged 3 commits intomainfrom
Conversation
7b1e198 to
8b4155b
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
--langsmith-splits) following LangSmith best practicesfastapi/slowapidepsLangSmith Integration
Eval Backend (
weco/integrations/langsmith/backend.py)Pluggable backend implementing
register_args(),validate_args(), andbuild_eval_command(). Adds 12--langsmith-*CLI flags covering dataset, target, evaluators, splits, adapters, dashboard evaluators, and custom metric functions.Evaluation Bridge (
weco/integrations/langsmith/bridge.py)Subprocess runner that dynamically imports user code, resolves evaluators (custom
module:functionspecs, LangSmith built-ins, or localevaluators.py), runsclient.evaluate(), polls for async dashboard evaluator scores, and printsmetrics in Weco's
key: valueformat. Supports target adapters for LangChain runnables and single-input functions.Setup Wizard (
weco/integrations/langsmith/wizard/)Browser-based single-page app served from a local HTTP server. Guides users through API key setup, dataset selection (with split picker), source file selection, target function discovery (AST-based), and evaluator configuration. Mutates the
CLI args namespace on submit.
Dataset Splits
All layers support
--langsmith-splitsto filter evaluation to specific dataset splits (e.g.opt,holdout) instead of requiring separate datasets per split. The wizard fetches available splits and shows a chip picker.CLI Changes (
weco/cli.py)--eval-backendflag with backend registry (_EVAL_BACKENDS,_load_backend())--source,--eval-command,--metric,--goalnon-required when using a backendbackend.register_args()Example
examples/langsmith-zeph-hr-qa/— HR QA agent optimized against a LangSmith dataset with custom evaluators, dashboard LLM judges, and a gated metric function. Uses dataset splits for train/holdout separation.