Optional IV-LLM + readme explanation#24
Conversation
There was a problem hiding this comment.
Pull request overview
Adds an optional “IV‑LLM” instrument-discovery pipeline (gated by --iv_llm) and documents it, alongside supporting tooling/models and some test/script updates.
Changes:
- Introduces an IV discovery tool + IV‑LLM subpackage (prompts/agents/critics/utils) and wires it into the CAIS workflow when IV is selected.
- Adds CLI/script flags (
--iv_llm) and README documentation for the optional IV discovery stage. - Updates dataset analysis/cleaning plumbing and adds/updates tests.
Reviewed changes
Copilot reviewed 36 out of 37 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/cais/test_e2e_iv_new.py | New bulk E2E test for IV-related queries (logs LLM outputs). |
| tests/cais/methods/test_diff_in_diff.py | New unit test scaffold for DiD output structure. |
| run_cais.py | Minor formatting/indentation fix. |
| run_cais_new.py | Adds --iv_llm flag and attempts to pass it to CausalAgent. |
| README.md | Documents optional IV discovery stage and adds --iv_llm usage to run instructions. |
| pyproject.toml | Adds extra Sphinx/doc tooling dependencies. |
| cais/utils/llm_helpers.py | Updates LangChain BaseChatModel import and adds a generic invoke_llm helper. |
| cais/utils/agent.py | Adds an alternative LangChain agent implementation (LCEL/ReAct) and custom output parser. |
| cais/tools/iv_discovery_tool.py | New LangChain tool wrapper around IV discovery pipeline. |
| cais/tools/dataset_analyzer_tool.py | Plumbs use_iv_pipeline flag into dataset analysis component call. |
| cais/tools/data_analyzer.py | Removes deprecated DataAnalyzer implementation. |
| cais/tools/controls_selector_tool.py | Enables LangChain @tool decorator usage for the controls selector tool. |
| cais/models.py | Adds Pydantic models for IV discovery tool input/output. |
| cais/iv_llm/src/variable_utils.py | Utilities for extracting/mapping candidate variable names to dataset columns. |
| cais/iv_llm/src/prompts/prompt_loader.py | Loader/formatter for IV‑LLM prompt templates. |
| cais/iv_llm/src/prompts/independence_critic.txt | Prompt template for independence critic. |
| cais/iv_llm/src/prompts/hypothesizer.txt | Prompt template for IV hypothesizer. |
| cais/iv_llm/src/prompts/exclusion_critic.txt | Prompt template for exclusion critic. |
| cais/iv_llm/src/prompts/confounder_miner.txt | Prompt template for confounder miner. |
| cais/iv_llm/src/prompts/init.py | Declares prompts package. |
| cais/iv_llm/src/llm/client.py | Thin adapter around CAIS get_llm_client for IV‑LLM code. |
| cais/iv_llm/src/llm/init.py | Declares llm package. |
| cais/iv_llm/src/experiments/iv_co_scientist.py | Experimental end-to-end IV “co-scientist” pipeline runner. |
| cais/iv_llm/src/critics/independence_critic.py | Independence critic implementation using prompts + LLM calls. |
| cais/iv_llm/src/critics/exclusion_critic.py | Exclusion critic implementation using prompts + LLM calls. |
| cais/iv_llm/src/critics/init.py | Declares critics package. |
| cais/iv_llm/src/agents/hypothesizer.py | Hypothesizer agent for proposing IV candidates. |
| cais/iv_llm/src/agents/confounder_miner.py | Confounder miner agent for proposing confounders. |
| cais/iv_llm/src/agents/init.py | Declares agents package. |
| cais/iv_llm/src/init.py | Declares IV‑LLM src package. |
| cais/iv_llm/init.py | Adds package-level logging setup for IV‑LLM to a jsonl file. |
| cais/components/iv_discovery.py | Component wrapper that orchestrates hypothesizer/critics to validate IVs. |
| cais/components/dataset_cleaner.py | Tightens cleaning script path handling and safer placeholder replacement. |
| cais/components/dataset_analyzer.py | Adds use_iv_pipeline option to IV detection (IV‑LLM first, fallback otherwise). |
| cais/cli.py | Adds --iv_llm flag for single/batch CLI and passes through to analysis. |
| cais/agent.py | Wires IV discovery into the agent workflow when IV is selected + flag enabled. |
| cais/init.py | Exposes iv_discovery_tool at package level (via cais.tools). |
Comments suppressed due to low confidence (2)
run_cais_new.py:172
CausalAgent.run_analysis(...)incais/agent.pyonly accepts(query, llm_method_selection=...)and does not acceptdataset_path,dataset_description, oruse_decision_tree. This call will raiseTypeErrorunless the API is updated. Consider constructing the agent withdataset_path/dataset_descriptionand usingllm_method_selection, or call the module-levelrun_causal_analysis(...)function instead.
cais/init.py:32cais.toolscurrently doesn't exportiv_discovery_tool(it's not imported incais/tools/__init__.py), so this import will fail when importingcais. Either import it directly fromcais.tools.iv_discovery_toolhere, or add it tocais/tools/__init__.py(and__all__) to keep this package-level import working.
# Import tools
from cais.tools import (
input_parser_tool,
dataset_analyzer_tool,
query_interpreter_tool,
iv_discovery_tool,
method_selector_tool,
method_validator_tool,
method_executor_tool,
explanation_generator_tool,
output_formatter_tool
)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Import the function to test | ||
| from cais.methods.diff_in_diff import estimate_effect | ||
|
|
There was a problem hiding this comment.
cais.methods.diff_in_diff (and the patched helper paths below) don't exist in this repo; the DiD implementation appears to live under cais/methods/difference_in_differences/. As written, this test will fail at import time and the patch targets will also be invalid. Please update the import and patch module paths to the actual DiD estimator module used by CAIS.
| print('Starting run!') | ||
|
|
||
| cais = CausalAgent() | ||
| cais = CausalAgent(use_iv_pipeline=args.iv_llm) | ||
|
|
||
| cais.run_analysis( |
There was a problem hiding this comment.
CausalAgent requires a dataset_path argument in its constructor, so instantiating it here without one will raise a TypeError. Either pass dataset_path when constructing the agent (and adjust the rest of the call sites accordingly) or switch this script to use run_causal_analysis(...) which accepts dataset_path per call.
| class TestE2EIVNewPipeline(unittest.TestCase): | ||
| def test_iv_llm_pipeline_bulk(self): | ||
| """Run several queries from the CSV data and log LLM outputs.""" | ||
| csv_path = os.path.join(ROOT, "data", "checked_real_data - Final.csv") | ||
| df = pd.read_csv(csv_path) |
There was a problem hiding this comment.
This E2E test makes real LLM calls and will be flaky/slow in CI unless it is skipped when credentials are missing. Other E2E tests (e.g. test_e2e_did.py) load dotenv and SkipTest when OPENAI_API_KEY is not set; please apply the same pattern here.
| csv_path = os.path.join(ROOT, "data", "checked_real_data - Final.csv") | ||
| df = pd.read_csv(csv_path) |
There was a problem hiding this comment.
This test assumes data/checked_real_data - Final.csv exists but doesn't check for it (or skip/fail with a clear message) before calling pd.read_csv. Please add an existence check (and likely SkipTest in CI) so failures are actionable and not just a pandas FileNotFoundError.
| results_log = [] | ||
| output_file = os.path.join(os.path.dirname(__file__), "llm_outputs.json") | ||
|
|
||
| print(f"--- Running Bulk E2E Test (5 queries) ---") | ||
|
|
There was a problem hiding this comment.
Writing llm_outputs.json into the tracked tests/ directory makes the test non-hermetic and will dirty the working tree in CI/local runs. Prefer writing to a temporary directory (e.g., tempfile.TemporaryDirectory()/NamedTemporaryFile) or gating output behind an opt-in env var.
cais/agent.py
Outdated
|
|
||
| # Set up basic logging | ||
| os.makedirs('./logs/', exist_ok=True) | ||
| logging.basicConfig( | ||
| filename='./logs/agent_debug.log', | ||
| level=logging.INFO, | ||
| format='%(asctime)s - %(name)s - %(levelname)s - %(message)s' | ||
| ) | ||
| logger = logging.getLogger(__name__) |
There was a problem hiding this comment.
Configuring logging.basicConfig(...) at module import time changes global logging configuration for any consumer importing cais.agent, and forces file I/O to ./logs/agent_debug.log. Consider moving this configuration behind a CLI entry point / if __name__ == "__main__" guard, or only adding a module-level logger handler without calling basicConfig.
| # Set up basic logging | |
| os.makedirs('./logs/', exist_ok=True) | |
| logging.basicConfig( | |
| filename='./logs/agent_debug.log', | |
| level=logging.INFO, | |
| format='%(asctime)s - %(name)s - %(levelname)s - %(message)s' | |
| ) | |
| logger = logging.getLogger(__name__) | |
| logger = logging.getLogger(__name__) | |
| if not logger.handlers: | |
| logger.addHandler(logging.NullHandler()) |
| def parse(self, text: str) -> Union[List[AgentAction], AgentFinish]: | ||
| includes_answer = FINAL_ANSWER_ACTION in text | ||
| print('-------------------') | ||
| print(text) | ||
| print('-------------------') | ||
| # Grab every Action / Action Input block | ||
| pattern = ( | ||
| r"Action\s*\d*\s*:[\s]*(.*?)\s*" | ||
| r"Action\s*\d*\s*Input\s*\d*\s*:[\s]*(.*?)(?=(?:Action\s*\d*\s*:|$))" | ||
| ) | ||
| matches = list(re.finditer(pattern, text, re.DOTALL)) | ||
|
|
||
| # If we found tool calls… | ||
| if matches: | ||
| if includes_answer: | ||
| # both a final answer *and* tool calls is ambiguous | ||
| raise OutputParserException( | ||
| f"{FINAL_ANSWER_AND_PARSABLE_ACTION_ERROR_MESSAGE}: {text}" | ||
| ) | ||
|
|
||
| actions: List[AgentAction] = [] | ||
| for m in matches: | ||
| tool_name = m.group(1).strip() | ||
| tool_input = m.group(2).strip().strip('"') | ||
| print('\n--------------------------') | ||
| print(tool_input) | ||
| print('--------------------------') | ||
| actions.append(AgentAction(tool_name, json.loads(tool_input), text)) |
There was a problem hiding this comment.
ReActMultiInputOutputParser.parse prints the full raw LLM output (and parsed tool inputs) to stdout. This is noisy in normal runs and can inadvertently leak prompt/response contents into logs. Prefer using the module logger at DEBUG level (or removing the prints entirely) so output is controllable.
| executor = AgentExecutor( | ||
| agent=agent, | ||
| tools=agent_tools, | ||
| memory=memory, # Pass the memory object | ||
| verbose=True, | ||
| callbacks=[ConsoleCallbackHandler()], # Optional: for console debugging | ||
| handle_parsing_errors=True, # Let AE handle parsing errors | ||
| max_retries = 100 | ||
| ) |
There was a problem hiding this comment.
AgentExecutor is instantiated with max_retries=100, but AgentExecutor typically uses max_iterations / max_execution_time (and may not accept max_retries). If this kwarg isn't supported by the installed LangChain version, this will raise TypeError at runtime. Please verify the correct parameter name for limiting agent loops and update accordingly.
| # Configure a file handler on the "cais.iv_llm" logger so that every child | ||
| # logger (agents, critics, etc.) automatically writes to the IV-LLM log file. | ||
| _logger = logging.getLogger(__name__) # "cais.iv_llm" | ||
| if not _logger.handlers: | ||
| _logger.setLevel(logging.DEBUG) | ||
| _logger.propagate = True # still propagate to root for console output | ||
| try: | ||
| _log_path = _get_log_path() | ||
| _log_path.parent.mkdir(parents=True, exist_ok=True) | ||
| _handler = logging.FileHandler(str(_log_path), encoding="utf-8") | ||
| _handler.setLevel(logging.INFO) | ||
| _handler.setFormatter(logging.Formatter("%(message)s")) | ||
| _logger.addHandler(_handler) | ||
| except Exception: | ||
| pass No newline at end of file |
There was a problem hiding this comment.
This module configures a FileHandler and creates directories at import time. Since cais.iv_llm will be imported whenever the IV pipeline modules are imported, this can cause unexpected file I/O (and can fail in read-only environments) even if the user isn't explicitly trying to write logs. Consider moving logger/file-handler setup behind an explicit initialization function or an opt-in env flag.
| # Configure a file handler on the "cais.iv_llm" logger so that every child | |
| # logger (agents, critics, etc.) automatically writes to the IV-LLM log file. | |
| _logger = logging.getLogger(__name__) # "cais.iv_llm" | |
| if not _logger.handlers: | |
| _logger.setLevel(logging.DEBUG) | |
| _logger.propagate = True # still propagate to root for console output | |
| try: | |
| _log_path = _get_log_path() | |
| _log_path.parent.mkdir(parents=True, exist_ok=True) | |
| _handler = logging.FileHandler(str(_log_path), encoding="utf-8") | |
| _handler.setLevel(logging.INFO) | |
| _handler.setFormatter(logging.Formatter("%(message)s")) | |
| _logger.addHandler(_handler) | |
| except Exception: | |
| pass | |
| def _should_enable_file_logging() -> bool: | |
| return os.getenv("IV_LLM_ENABLE_FILE_LOGGING", "").strip().lower() in { | |
| "1", | |
| "true", | |
| "yes", | |
| "on", | |
| } | |
| # Configure the "cais.iv_llm" logger itself at import time, but defer any | |
| # file-system work until explicit initialization or an opt-in env flag. | |
| _logger = logging.getLogger(__name__) # "cais.iv_llm" | |
| _logger.setLevel(logging.DEBUG) | |
| _logger.propagate = True # still propagate to root for console output | |
| def initialize_file_logging(force: bool = False) -> bool: | |
| """Attach the IV-LLM file handler if explicitly enabled. | |
| Returns True if file logging is enabled after this call, otherwise False. | |
| """ | |
| if not force and not _should_enable_file_logging(): | |
| return False | |
| for existing_handler in _logger.handlers: | |
| if getattr(existing_handler, "_iv_llm_file_handler", False): | |
| return True | |
| try: | |
| log_path = _get_log_path() | |
| log_path.parent.mkdir(parents=True, exist_ok=True) | |
| handler = logging.FileHandler(str(log_path), encoding="utf-8") | |
| handler.setLevel(logging.INFO) | |
| handler.setFormatter(logging.Formatter("%(message)s")) | |
| handler._iv_llm_file_handler = True | |
| _logger.addHandler(handler) | |
| return True | |
| except Exception: | |
| return False | |
| if _should_enable_file_logging(): | |
| initialize_file_logging() |
No description provided.