Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
57 commits
Select commit Hold shift + click to select a range
37b2eb9
build(deps): bump actions/cache from 3 to 5 (#259)
dependabot[bot] Jan 1, 2026
0742f93
build(deps): bump actions/checkout from 5 to 6 (#239)
dependabot[bot] Jan 1, 2026
760b8ce
feat: refactor metrics and report (#236)
MrtinoRG Jan 2, 2026
79ac324
feat: attach logprobs in message, add to hook context (#263)
n0w0f Jan 12, 2026
c0ff1f4
feat: llm response object and messages have id (#266)
n0w0f Jan 17, 2026
1a2695e
added docstrings for scoring functions (#272)
chandan21gupta Jan 21, 2026
000ab2f
feat: load from trace and fix toolcalling intervention (#271)
n0w0f Jan 22, 2026
a70187a
feat: make the visualization tool general (#261)
MrtinoRG Jan 24, 2026
d380812
feat: spectra runs (02022026) (#200)
MrtinoRG Feb 3, 2026
8d458a2
feat: add first files of the retro env (#251)
MrtinoRG Feb 3, 2026
2dc2459
fix: solve scoring for the spectra and retro subtasks (#285)
MrtinoRG Feb 13, 2026
b577564
feat: add gpt_oss ml runs (#269)
n0w0f Feb 14, 2026
eda37ca
feat: gpt_oss reports for resistor (#268)
n0w0f Feb 14, 2026
b531979
feat: add gpt_oss catalyst runs (#270)
n0w0f Feb 14, 2026
a28e100
feat: reasoning qa , resistor, ml, catalyst (#280)
n0w0f Feb 15, 2026
dae9160
afm and md qa - final version, commends already addressed (#287)
chandan21gupta Feb 15, 2026
48354f8
feat: add Spectra Reasoning QA (#279)
MrtinoRG Feb 16, 2026
e62745c
feat: add retro reasoning qa (#276)
MrtinoRG Feb 16, 2026
90430f7
feat: add missing QA scores (including reasoning) (#267)
MrtinoRG Feb 24, 2026
1ad0ec6
AFM updates (#288)
imandal98 Feb 24, 2026
5faf2ea
Md corr vis dev (#289)
chandan21gupta Feb 24, 2026
0c6228e
feat: expose history (#298)
MrtinoRG Feb 26, 2026
1a22f57
feat: add code for `to_latex()` method (#262)
MrtinoRG Feb 26, 2026
6fd39f0
chore: rm older runs (#305)
n0w0f Mar 2, 2026
4cd0b2f
fix: correct that trace is not saved when agent fail (#308)
MrtinoRG Mar 4, 2026
7037a6c
chore: fixes in consistencies with trial ids and missing traces (#306)
n0w0f Mar 4, 2026
0d62fec
Md corrected runs (#290)
chandan21gupta Mar 5, 2026
439c70f
feat: add script to pull data + snakefile (#309)
MrtinoRG Mar 5, 2026
5c3a61b
chore: script to get QA score similar to reports (#310)
n0w0f Mar 5, 2026
c4e43c8
feat: add plot scripts (#312)
MrtinoRG Mar 17, 2026
8edb0a7
GPT OSS runs for AFM using a new naming convention (#320)
imandal98 Mar 19, 2026
7ae856b
Reports to hf (#302)
chandan21gupta Mar 19, 2026
34c111e
feat: react oss - catalyst subtask and resistor comprehensive (#322)
n0w0f Mar 20, 2026
26f1062
chore: utility shared across plots (#314)
n0w0f Mar 24, 2026
fca1dc2
Md re runs (#330)
chandan21gupta Mar 26, 2026
1650a67
Wetlab+Reports (#323)
aaaghajani Mar 26, 2026
b6f974b
feat: add appendix tables (#325)
MrtinoRG Apr 2, 2026
c85f567
feat: retro db can be pulled from ghcr
MrtinoRG Apr 3, 2026
3519b48
feat: add panel 5 plots (#317)
MrtinoRG Apr 8, 2026
52c340a
feat: add epistemic analysis (#319)
MrtinoRG Apr 8, 2026
58a77de
feat: panel 2 performance plots (#313)
n0w0f Apr 11, 2026
f38d849
feat: panel 4 irt plots (#315)
n0w0f Apr 12, 2026
b15d56b
chore: remove report files (#337)
MrtinoRG Apr 13, 2026
cb26b8b
fix: standardize @tool docstring tags across all environments (#331)
n0w0f Apr 17, 2026
db28e49
refactor: standardize task configs across all environments (#328)
n0w0f Apr 17, 2026
057a6b3
feat: landing page for Corral benchmark (#324)
n0w0f Apr 17, 2026
6c23806
ci: enable site deployment on push to dev branch (#344)
n0w0f Apr 17, 2026
89a0cc2
feat: intervention experiment pipeline (#334)
n0w0f Apr 18, 2026
efa0945
feat: add guidelines plus the app (#340)
MrtinoRG Apr 19, 2026
8691e23
feat: add scripts for solving last ToDos (#342)
MrtinoRG Apr 19, 2026
db95304
feat: plot enhancements for panel 2, panel 4, and tikz figure (#343)
n0w0f Apr 19, 2026
328cc3d
feat: add scripts to do ai scientists search (#339)
MrtinoRG Apr 19, 2026
78f08b7
chore: remove duplicate deps, pin litellm<1.82, clean gitignore (#346)
n0w0f Apr 19, 2026
96c7c66
feat: add epistemic trace explorer to landing page (#345)
n0w0f Apr 19, 2026
6ddd5a1
feat: add statistics + illustrative traces (#347)
MrtinoRG Apr 21, 2026
cb2a757
feat: possible solution for the tool types (#333)
MrtinoRG Apr 21, 2026
94de3f6
Merge branch 'tool' of github.com:lamalab-org/corral into dev
MrtinoRG Apr 21, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
The diff you're trying to view is too large. We only load the first 3000 changed files.
2,107 changes: 28 additions & 2,079 deletions .gitattributes

Large diffs are not rendered by default.

93 changes: 55 additions & 38 deletions .github/workflows/docs.yaml
Original file line number Diff line number Diff line change
@@ -1,62 +1,79 @@
name: Docs
name: Deploy Site & Docs

on:
push:
branches: [main]
branches: [main, dev]
pull_request:
paths:
- docs/**
- site/**
- tasks/**/tools.py
- tasks/**/score.py
- tasks/**/env.py
- tasks/**/environments/**/tasks_json/**
- tasks/**/environments/**/subtasks_json/**
- pyproject.toml
- mkdocs.yaml
workflow_dispatch:

permissions:
contents: read
pages: write
id-token: write

concurrency:
group: pages
cancel-in-progress: false

jobs:
docs:
build:
runs-on: ubuntu-latest
permissions:
contents: write
pages: write
id-token: write

steps:
- uses: actions/checkout@v6
- uses: actions/checkout@v4
with:
lfs: true

- name: Set up Python 3.10
uses: actions/setup-python@v5
with:
python-version: "3.10"
cache: "pip"
cache-dependency-path: |
pyproject.toml
requirements*.txt

- name: Get pip cache dir
id: pip-cache
run: echo "dir=$(pip cache dir)" >> $GITHUB_OUTPUT
- name: Install uv
uses: astral-sh/setup-uv@v4

- name: Cache pip dependencies
uses: actions/cache@v3
with:
path: ${{ steps.pip-cache.outputs.dir }}
key: ${{ runner.os }}-pip-docs-${{ hashFiles('pyproject.toml', 'requirements*.txt') }}
restore-keys: |
${{ runner.os }}-pip-docs-
- name: Install dependencies
run: |
uv sync --extra docs

- name: Cache mkdocs
uses: actions/cache@v3
with:
path: .cache/mkdocs
key: ${{ runner.os }}-mkdocs-${{ hashFiles('docs/**') }}
restore-keys: |
${{ runner.os }}-mkdocs-
- name: Generate landing-page data
run: uv run python site/extract_data.py

- name: Install dependencies
- name: Build MkDocs into docs/ subdirectory
run: uv run mkdocs build --site-dir _build/docs

- name: Assemble site
run: |
python -m pip install --upgrade pip wheel setuptools
pip install -e ".[docs]"
mkdir -p _build
cp site/index.html _build/
cp site/data.js _build/
cp site/traces.js _build/
cp site/annotator.js _build/
cp site/corral_logo.png _build/
cp site/corral_arch.png _build/

- name: Build docs
run: mkdocs build
- name: Upload Pages artifact
uses: actions/upload-pages-artifact@v3
with:
path: _build

- name: Deploy docs
run: mkdocs gh-deploy --force
if: github.event_name == 'push' && github.ref == 'refs/heads/main'
deploy:
if: github.event_name == 'push' && (github.ref == 'refs/heads/main' || github.ref == 'refs/heads/dev')
needs: build
runs-on: ubuntu-latest
environment:
name: github-pages
url: ${{ steps.deployment.outputs.page_url }}
steps:
- name: Deploy to GitHub Pages
id: deployment
uses: actions/deploy-pages@v4
79 changes: 74 additions & 5 deletions .github/workflows/tests.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ jobs:
uv pip install pre-commit

- name: Cache pre-commit hooks
uses: actions/cache@v4
uses: actions/cache@v5
with:
path: ~/.cache/pre-commit
key: pre-commit-${{ hashFiles('.pre-commit-config.yaml') }}
Expand Down Expand Up @@ -128,8 +128,7 @@ jobs:
- name: Install task dependencies
run: |
# Install the task package in development mode
# For src layouts, ensure editable install works properly
uv pip install -e . --force-reinstall
uv pip install -e .
working-directory: ./tasks/${{ matrix.task }}

- name: Run task tests
Expand All @@ -140,19 +139,89 @@ jobs:
uv run python -m pytest tests -v --rootdir=.
working-directory: ./tasks/${{ matrix.task }}

# Discover tasks with env.py files for server setup checking
discover-env-tasks:
needs: code-quality
runs-on: ubuntu-latest
outputs:
tasks: ${{ steps.discover.outputs.tasks }}
steps:
- uses: actions/checkout@v6

- name: Discover tasks with env.py (excluding those requiring special setup)
id: discover
run: |
# Excluded tasks:
# retrosynthesis - requires a running database
# afm - requires nanosurf (Windows-only hardware library)
# wetlab - requires reaktoro (conda-only, not on PyPI)
# samplemath - uses bare imports incompatible with package-level import
EXCLUDED="retrosynthesis afm wetlab samplemath"
tasks=$(find tasks -maxdepth 1 -type d | while read task_dir; do
task_name=$(basename "$task_dir")
if [[ "$task_name" == "tasks" ]]; then continue; fi
if echo "$EXCLUDED" | grep -qw "$task_name"; then continue; fi
if [[ ! -f "$task_dir/pyproject.toml" ]]; then continue; fi
if find "$task_dir" -name "env.py" -maxdepth 4 -type f | grep -q .; then
echo "$task_name"
fi
done | sort | jq -R -s -c 'split("\n")[:-1]')
echo "tasks=$tasks" >> $GITHUB_OUTPUT
echo "Found env tasks: $tasks"

# Check that each task's env server can be set up correctly
check-env-servers:
needs: [discover-env-tasks, code-quality]
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
task: ${{ fromJson(needs.discover-env-tasks.outputs.tasks) }}

steps:
- uses: actions/checkout@v6
with:
lfs: true

- name: Ensure Git LFS is initialized
run: |
git lfs install
git lfs pull --exclude="" || echo "Some LFS objects missing, continuing..."

- name: Install uv
uses: astral-sh/setup-uv@v7
with:
enable-cache: true

- name: Install root project (corral)
run: uv sync --all-extras --dev

- name: Install task dependencies
run: uv pip install -e .
working-directory: ./tasks/${{ matrix.task }}

- name: Check env server setup
run: |
uv run python -c "from ${{ matrix.task }} import env; print('Server setup OK')"
working-directory: ./tasks/${{ matrix.task }}

# Summary job to check all results
test-summary:
needs: [test-main, test-tasks]
needs: [test-main, test-tasks, check-env-servers]
runs-on: ubuntu-latest
if: always()
steps:
- name: Check test results
run: |
if [[ "${{ needs.test-main.result }}" == "success" ]] && [[ "${{ needs.test-tasks.result }}" == "success" ]]; then
env_result="${{ needs.check-env-servers.result }}"
if [[ "${{ needs.test-main.result }}" == "success" ]] && \
[[ "${{ needs.test-tasks.result }}" == "success" ]] && \
[[ "$env_result" == "success" || "$env_result" == "skipped" ]]; then
echo "✅ All tests passed!"
else
echo "❌ Some tests failed:"
echo "Main tests: ${{ needs.test-main.result }}"
echo "Task tests: ${{ needs.test-tasks.result }}"
echo "Env server checks: $env_result"
exit 1
fi
42 changes: 36 additions & 6 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@
.cache/
docs/reference/*
*doctrees*
/site

# Python ignores
__pycache__/
Expand Down Expand Up @@ -62,6 +61,7 @@ dmypy.json
# Environments
.env
.venv
.micromamba/
venv/
env/
ENV/
Expand All @@ -71,27 +71,57 @@ venv.bak/

prompts.py

**/run_react_agent.py
**/run_tool_calling_agent.py

# dev

.metaflow
notebooks
results.json

.code2latex_cache/
.promptstore/
prompts.py
*.log
*.aux
*results.json
agent_logs-ConcreteAgent-test-model-brief
CORRAL_WORK_DIR
CORRAL_WORK_DIR/*
*/CORRAL_WORK_DIR/*
graph_output/
benchmark_checkpoints
agent_logs
results/
results
nmrshiftdb2withsignals.sd
plots/results
wandb
questions_resistor.json

reports/corral_md_visualisation/**/*.py

reports/corral_md_visualisation/**/logprobs_*/
reports/corral_md_visualisation/**/metrics_*/

reports_v2/**/logprobs*/
other_examples.json
logprobs_perplexity_data.json
analysis/results/



results/**/*.jsonl
results/**/*.csv
results/**/*.nc
results/**/*.png
results/**/*.tex
results/**/*.pdf
results/**/*.html
results/**/*.md
results/**/*.txt
results/**/*.log
results/**/*.err
results/**/*.out
results/**/*.log.gz

**/.snakemake/
analysis/*/
reasoning_reports_old
.claude/*
16 changes: 16 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,22 @@ repos:
- id: blacken-docs
exclude: ^tasks/(?!.*/src/).*

- repo: local
hooks:
- id: check-env-convention
name: check environment convention
entry: python scripts/check_env_convention.py
language: python
pass_filenames: false
files: ^tasks/
- id: validate-tool-docstrings
name: Validate @tool docstrings
entry: python3 scripts/validate_tool_docstrings.py
language: system
files: (tools\.py$|src/corral/utils/.*_tools\.py$)
exclude: ^tests/
pass_filenames: true

- repo: https://github.com/commitizen-tools/commitizen
rev: v4.8.0
hooks:
Expand Down
Loading