Skip to content

Standard evals#55

Merged
kohankhaki merged 12 commits into
mainfrom
standard_evals
Feb 25, 2026
Merged

Standard evals#55
kohankhaki merged 12 commits into
mainfrom
standard_evals

Conversation

@kohankhaki
Copy link
Copy Markdown
Contributor

@kohankhaki kohankhaki commented Jan 27, 2026

PR Type

Fix and Feature

Short Description

Add evaluation pipeline with standardized schemas for running Inspect AI evaluations on generated tasks. Also moves legacy LBO code to a separate directory.

1. Evaluation Schemas

  • src/schemas/eval_schemas.py - Schema definitions for eval pipeline
  • src/schemas/eval_io_utils.py - Save/load utilities for eval data
  • src/schemas/__init__.py - Updated exports
  • src/schemas/EVALUATION_PIPELINE_SCHEMAS.md - Documentation

2. Evaluation Pipeline (New Feature)

Core implementation of the 3-stage evaluation pipeline:

  • src/eval_stages/__init__.py - Module exports
  • src/eval_stages/stage0_setup_and_dataset.py - Setup & dataset preparation (no LLM calls)
  • src/eval_stages/stage1_eval_execution.py - Run Inspect AI evaluations
  • src/eval_stages/stage2_score_aggregation.py - Aggregate scores from results
  • src/eval_stages/prompts.py - Default prompt template for evaluation pipeline
  • src/run_eval_pipeline.py - Entry point for running the pipeline

3. Legacy Code Migration

Moved LBO code to legacy/ directory:

  • src/lbo.pylegacy/src/lbo.py
  • src/run_lbo.pylegacy/src/run_lbo.py
  • src/utils/lbo_utils.pylegacy/src/utils/lbo_utils.py
  • tests/src/test_lbo*.pylegacy/tests/

4. Configuration Updates

  • src/cfg/run_cfg.yaml - Added eval config section, simplified generation config
  • pyproject.toml - Exclude legacy tests from pytest

Tests Added

None

…nd write objects. added eval pipeline schemas and implementation.
- Fix stage0: task_obj → task, capability.name → capability_name,
  domain.name → domain_name, ts.task → ts.task_statement
- Fix stage1/stage2: Add proper generic type hints (Dict[str, Any])
- Fix eval_schemas.py: Correct stage numbers in docstrings
  (Stage 1/3 → Stage 0/2)
@kohankhaki kohankhaki marked this pull request as ready for review February 4, 2026 08:57
Copy link
Copy Markdown
Collaborator

@afkanpour afkanpour left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Farnaz

@kohankhaki kohankhaki merged commit 3b993f9 into main Feb 25, 2026
2 checks passed
@kohankhaki kohankhaki deleted the standard_evals branch February 25, 2026 16:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants