Skip to content

feat: intervention-sycophantic#338

Draft
n0w0f wants to merge 15 commits intodevfrom
corral-intervention-sycophantic
Draft

feat: intervention-sycophantic#338
n0w0f wants to merge 15 commits intodevfrom
corral-intervention-sycophantic

Conversation

@n0w0f
Copy link
Copy Markdown
Collaborator

@n0w0f n0w0f commented Apr 8, 2026

Add scripts to run intervention experiments that inject steps from
successful/failed traces into new agent runs to measure knowledge
vs reasoning gaps across scientific environments.

Pipeline: select tasks (from reports_v2) -> run baseline -> pick
traces from baseline -> run intervention conditions -> analyze.

n0w0f added 15 commits April 1, 2026 13:18
Add scripts to run intervention experiments that inject steps from
successful/failed traces into new agent runs to measure knowledge
vs reasoning gaps across scientific environments.

Pipeline: select tasks (from reports_v2) -> run baseline -> pick
traces from baseline -> run intervention conditions -> analyze.
- Each env now has two server ports (react/toolcalling) to allow safe
  parallel runs — the server is stateful and concurrent agents would clash
- Add scripts/setup_envs.sh for one-time venv creation (uv for spectra/
  resistor, micromamba for wetlab due to conda-only reaktoro)
- launch_sweep.sh gains --start-servers/--stop-servers/--server-status
- Resistor env.py uses argparse with --mode single/chained (no path needed)
- Wetlab pyproject.toml updated with corral dep and uv.sources
Replace declare -A (bash 4+) with case-based lookup functions.
Tested on macOS bash 3.2.57. Also add generated task_selection.json.
- setup_envs.sh: upgrade promptstore + install boto3 for Bedrock
- launch_sweep.sh: add --trials flag for smoke testing (e.g. --trials 1)
- run_intervention.py: cap k_values at trials count to avoid validation error
- Verified end-to-end: setup venvs → start servers → launch baselines → reports
Updated uv.lock files across all task environments after
upgrading promptstore. Added generated prompts/index.json.
Copy link
Copy Markdown

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, we are unable to review this pull request

The GitHub API does not allow us to fetch diffs exceeding 20000 lines

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 8, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Free

Run ID: 1e5c2147-b36b-4f8d-a5f0-bc1c962f27f1

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review

Note

🎁 Summarized by CodeRabbit Free

Your organization is on the Free plan. CodeRabbit will generate a high-level summary and a walkthrough for each pull request. For a comprehensive line-by-line review, please upgrade your subscription to CodeRabbit Pro by visiting https://app.coderabbit.ai/login.

Comment @coderabbitai help to get the list of available commands and usage tips.

@MrtinoRG
Copy link
Copy Markdown
Collaborator

@n0w0f can we close this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants