Skip to content

Fraunhofer-IIS/cedar

Repository files navigation

CEDAR: An agentic system for solving data science tasks

This repository contains code for the CEDAR project, an agentic data science solution from the NLP team at IIS. CEDAR is an application for automating data science (DS) tasks with an agentic setup. Solving DS problems with LLMs is an underexplored area that has immense market value. The challenges are manifold: task complexities, data sizes, computational limitations, and context restrictions. We show that these can be alleviated via effective context engineering. We first impose structure into the initial prompt with DS-specific input fields, that serve as instructions for the agentic system. The solution is then materialized as an enumerated sequence of interleaved plan and code blocks generated by separate LLM agents, providing a readable structure to the context at any step of the workflow. Function calls for generating these intermediate texts, and for corresponding Python code, ensure that data stays local, and only aggregate statistics and associated instructions are injected into LLM prompts. Fault tolerance and context management are introduced via iterative code generation and smart history rendering. The viability of our agentic data scientist is demonstrated using canonical Kaggle challenges.

Code setup

git clone https://github.com/Fraunhofer-IIS/cedar.git
cd cedar

python -m venv .venv
source .venv/bin/activate

pip install -r requirements.txt

export OPENAI_API_KEY="sk-..."

streamlit run cedar_public.py

Publication

Rishiraj Saha Roy, Chris Hinze, Luzian Hahn and Fabian Küch, CEDAR: Context Engineering for Agentic Data Science, in Proceedings of the 48th European Conference on Information Retrieval (ECIR 2026), Delft, The Netherlands, 29 March - 02 April 2026, pages 200-205.

Artifacts

Contact

Please contact Rishiraj Saha Roy (rishiraj [DOT] saha [DOT] roy [AT] iis [DOT] fraunhofer [DOT] de) for questions, comments and any other feedback.

About

This repository contains code for the CEDAR project, an agentic data science solution from the NLP team at IIS.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages