Causal AI Scientist: Facilitating Causal Data Science with Large Language Models

Causal AI Scientist (CAIS) is an LLM-powered tool for generating data-driven answers to natural language causal queries. Given a natural language query (e.g., "Does participating in a job training program lead to higher income?"), an accompanying dataset, and its description, CAIS frames a suitable causal estimation problem, selects an appropriate inference method, executes it, runs diagnostic checks, and interprets the results in plain language.

Note: This repository is a work in progress and will be updated with additional instructions and files.

1. Introduction

Causal effect estimation is central to evidence-based decision-making across domains such as the social sciences, healthcare, and economics, but it requires substantial methodological expertise to apply correctly.

CAIS automates this process end-to-end using Large Language Models (LLMs) to:

Parse a natural language causal query and analyze dataset characteristics.
Select an appropriate causal inference method via a decision tree and structured prompting.
Execute the method using predefined code templates and validate the results.
Interpret the numerical output in the context of the original query.

Supported Methods:

Econometric: Difference-in-Differences (DiD), Instrumental Variables (IV), Ordinary Least Squares (OLS), Regression Discontinuity Design (RDD).
Causal Graph-based: Backdoor adjustment, Frontdoor adjustment.

2. Pipeline

CAIS consists of four successive stages, powered by a decision-tree-driven reasoning pipeline:

Stage 1 — Data Preprocessing & Query Decomposition

Profiles the dataset (column types, missing values, statistical distributions) and uses an LLM to identify treatment, outcome, and covariate variables.
Scans for method-specific variables such as instruments and running variables based on the dataset description and causal query.

Stage 2 — Method Selection

Traverses a rule-based decision tree that evaluates dataset properties (e.g., randomization, presence of temporal structure, availability of instruments) to select a valid causal inference method.
Breaking selection into explicit, verifiable steps ensures interpretability and avoids the opacity of direct LLM-based method selection.

Stage 2b — IV-LLM Pipeline (activated when IV is selected)

If the Instrumental Variable (IV) method is selected and the --iv_llm pipeline is enabled (based on IV Co-Scientist):

Hypothesis Generation: The LLM hypothesizes potential instruments based on dataset context and variable names.
Confounder Mining: Identifies potential confounders that might violate the independence or exclusion restrictions.
Critic Validation: Uses specialized LLM "critics" (Exclusion, Independence) to reason about the validity of each candidate instrument.
Final Selection: Selects the most robust instrument for the estimation stage.

Stage 3 — Validation

Runs standard statistical assumption checks for the selected method (e.g., the F-statistic for IV, covariate balance for OLS).
If any check fails, initiates a feedback loop back to Stage 2, incorporating information from the failure to skip the invalid method and identify the next plausible candidate.

Stage 4 — Method Execution & Interpretation

Executes the chosen method using predefined Python code templates with placeholders substituted from Stage 1, maximizing reliability over LLM-generated code.
Prompts an LLM to interpret the estimated causal effect, standard error, and confidence interval in the context of the original query, alongside validation caveats and a clear statement of assumptions and limitations.

3. Getting Started

Prerequisites:

Python 3.10
Conda (recommended)

Step 1: Clone the repository and copy the example configuration

git clone https://github.com/causalNLP/causal-agent.git
cd causal-agent
cp .env.example .env

Step 2: Load necessary compute modules

module load rust
module load gcc
module load openblas

Step 3: Create and activate a Python 3.10 environment

conda create -n cais python=3.10
conda activate cais
pip install -r requirements.txt

Step 4: Install the CAIS library

pip install -e .

⚠️ Keep your .env file secure and never commit it to version control.

4. Dataset Information

All datasets used to evaluate CAIS and the baseline models are available in the data/ directory:

Path	Description
`data/all_data/`	CSV files from the QRData and real-world study collections
`data/synthetic_data/`	CSV files for synthetic datasets
`data/qr_info.csv`	Metadata for QRData: filename, description, query, reference effect, intended method, remarks
`data/real_info.csv`	Metadata for real-world datasets
`data/synthetic_info.csv`	Metadata for synthetic datasets

5. Running CAIS

python run_cais_new.py \
    --metadata_path <path_to_metadata_csv> \
    --data_dir <path_to_data_folder> \
    --output_dir <output_folder> \
    --output_name <output_filename> \
    --llm_name <llm_name> \
    --llm_provider <llm_provider> \
    [--iv_llm]

Arguments:

Argument	Type	Description
`--metadata_path`	`str`	Path to the CSV file containing queries, dataset descriptions, and filenames
`--data_dir`	`str`	Path to the folder containing the data in CSV format
`--output_dir`	`str`	Path to the folder where output JSON results will be saved
`--output_name`	`str`	Name of the output JSON file
`--llm_name`	`str`	Name of the LLM to use (e.g., `gpt-4o`, `claude-3-5-sonnet`)
`--llm_provider`	`str`	LLM service provider (e.g., `openai`, `anthropic`, `together`)
`--iv_llm`	`bool`	(Optional) If present, enables the advanced experimental IV-LLM pipeline for instrument discovery.

Example:

python run_cais_new.py \
    --metadata_path "data/qr_info.csv" \
    --data_dir "data/all_data" \
    --output_dir "output" \
    --output_name "results_qr_4o" \
    --llm_name "gpt-4o-mini" \
    --llm_provider "openai" \
    --iv_llm

6. Reproducing Paper Results

Will be updated soon

7. Citation

If you use CAIS or build on this work, we would appreciate it if you could cite:

@inproceedings{
verma2025causal,
title={Causal {AI} Scientist: Facilitating Causal Data Science with Large Language Models},
author={Vishal Verma and Sawal Acharya and Devansh Bhardwaj and Samuel Simko and Yongjin Yang and Anahita Haghighat and Dominik Janzing and Mrinmaya Sachan and Bernhard Sch{\"o}lkopf and Zhijing Jin},
booktitle={NeurIPS 2025 Workshop on CauScien: Uncovering Causality in Science},
year={2025},
url={https://openreview.net/forum?id=EDWTHMVOCj}
}

The IV-LLM pipeline builds on the methodology introduced in IV Co-Scientist. If you use that component, please also cite:

@misc{sheth2026ivcoscientistmultiagentllm,
      title={IV Co-Scientist: Multi-Agent LLM Framework for Causal Instrumental Variable Discovery}, 
      author={Ivaxi Sheth and Zhijing Jin and Bryan Wilder and Dominik Janzing and Mario Fritz},
      year={2026},
      eprint={2602.07943},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2602.07943}
}

8. License

Distributed under the MIT License. See LICENSE for more information.

Name		Name	Last commit message	Last commit date
Latest commit History 157 Commits
.github/workflows		.github/workflows
baselines		baselines
blob/main/asset		blob/main/asset
cais		cais
data		data
data_generation		data_generation
docs		docs
evaluation		evaluation
examples		examples
reference_files		reference_files
scripts		scripts
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.readthedocs.yaml		.readthedocs.yaml
LICENSE		LICENSE
README.md		README.md
README_PYPI.md		README_PYPI.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
run_cais.py		run_cais.py
run_cais_new.py		run_cais_new.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Causal AI Scientist: Facilitating Causal Data Science with Large Language Models

Table of Contents

1. Introduction

2. Pipeline

Stage 1 — Data Preprocessing & Query Decomposition

Stage 2 — Method Selection

Stage 2b — IV-LLM Pipeline (activated when IV is selected)

Stage 3 — Validation

Stage 4 — Method Execution & Interpretation

3. Getting Started

4. Dataset Information

5. Running CAIS

6. Reproducing Paper Results

7. Citation

8. License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 9

Languages

Folders and files

Latest commit

History

Repository files navigation

Causal AI Scientist: Facilitating Causal Data Science with Large Language Models

Table of Contents

1. Introduction

2. Pipeline

Stage 1 — Data Preprocessing & Query Decomposition

Stage 2 — Method Selection

Stage 2b — IV-LLM Pipeline (activated when IV is selected)

Stage 3 — Validation

Stage 4 — Method Execution & Interpretation

3. Getting Started

4. Dataset Information

5. Running CAIS

6. Reproducing Paper Results

7. Citation

8. License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 9

Languages

Packages