A production-grade multi-agent system that performs end-to-end data science workflows autonomously. Built with LangGraph and Llama 3.2 via Ollama.
Give it a dataset + objective, and the system:
- Plans the execution strategy
- Explores and preprocesses data autonomously
- Selects and trains appropriate models
- Evaluates performance and compares models
- Explains results with feature importance
- Critiques its own work and iterates if needed
- Generates a comprehensive markdown report
No hardcoded pipelines. True agent behavior.
βββββββββββββββ
β Planner β β Decomposes objective into tasks
ββββββββ¬βββββββ
β
βΌ
βββββββββββββββ
β Data Agent β β Explores, cleans, preprocesses
ββββββββ¬βββββββ
β
βΌ
βββββββββββββββ
β Model Agent β β Trains baseline β advanced models
ββββββββ¬βββββββ
β
βΌ
βββββββββββββββ
β Evaluator β β Compares models, selects best
ββββββββ¬βββββββ
β
βΌ
βββββββββββββββ
β Explainer β β Feature importance, insights
ββββββββ¬βββββββ
β
βΌ
βββββββββββββββ
β Critic β β Reviews pipeline, decides iterate/finish
ββββββββ¬βββββββ
β
βΌ
Iterate? ββNoββ Report Generator
β
Yes
β
ββββββββββββ
β
βΌ
Model Agent (again)
- Python 3.10+
- Ollama installed locally
- Llama 3.2 model downloaded
# Clone repository
git clone <your-repo>
cd autonomous_data_science_agent
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Install and run Ollama with Llama 3.2
ollama pull llama3.2
ollama servepython main.py \
--dataset data/raw/your_dataset.csv \
--objective "Predict air quality and explain pollution drivers"python main.py \
--dataset ../Housing.csv \
--objective "Predict house prices and identify key value drivers"Output:
- Trained models saved in
data/outputs/ - Processed data in
processed/ - Final report in
reports/generated/report_TIMESTAMP.md
autonomous_data_science_agent/
β
βββ main.py # Entry point
βββ config.yaml # Configuration
βββ requirements.txt
β
βββ agents/ # Multi-agent system
β βββ planner_agent.py # Task decomposition
β βββ data_agent.py # Data exploration & preprocessing
β βββ modeling_agent.py # Model training & selection
β βββ evaluation_agent.py # Model comparison & evaluation
β βββ explanation_agent.py # Interpretability & insights
β βββ critic_agent.py # Self-critique β
β
βββ graph/
β βββ agent_graph.py # LangGraph orchestration
β βββ states.py # Shared state definition
β
βββ tools/
β βββ data_tools.py # Data utilities
β
βββ reports/
β βββ report_generator.py # Report creation
β βββ generated/ # Output reports
β
βββ data/
βββ processed/ # Cleaned data
βββ outputs/ # Models, artifacts
Edit config.yaml to customize:
llm:
provider: "ollama"
model: "llama3.2"
temperature: 0.3
max_iterations: 3
performance_threshold: 0.75
improvement_threshold: 0.05
modeling:
baseline_models:
- "linear_regression"
- "random_forest"
- "gradient_boosting"
advanced_models:
- "xgboost"
- "lightgbm"
- "neural_network"
cv_folds: 5
hyperparameter_tuning: true
data:
max_missing_ratio: 0.3
outlier_std_threshold: 3.0No fixed pipeline. The planner creates a task graph based on the objective using LLM reasoning.
- Data Agent uses LLM to decide preprocessing strategy (imputation, encoding, scaling)
- Modeling Agent selects algorithms dynamically based on task type and iteration
- Critic Agent determines when to iterate or finish based on performance thresholds
The Critic Agent reviews results and triggers improvements:
if performance < threshold:
β Iterate with advanced models
elif critic_has_suggestions and iteration < 2:
β Try suggested improvements
else:
β Finish and generate reportEach agent has a specific role and communicates via shared state (LangGraph TypedDict):
- State flows through the graph
- Agents can access previous agent outputs
- Conditional branching based on critique
Evaluator Agent automatically:
- Compares all trained models
- Ranks by appropriate metric (RΒ² for regression, accuracy for classification)
- Selects best performer for final report
Console Output:
INFO:agents.modeling_agent:π€ Modeling Agent: Training models
INFO:agents.modeling_agent:Training linear_regression...
INFO:agents.modeling_agent: β linear_regression - ('rmse', 1324506.96)
INFO:agents.modeling_agent:Training random_forest...
INFO:agents.modeling_agent: β random_forest - ('rmse', 1400565.97)
INFO:agents.modeling_agent:Training gradient_boosting...
INFO:agents.modeling_agent: β gradient_boosting - ('rmse', 1299385.98)
INFO:agents.evaluation_agent:Model Rankings:
INFO:agents.evaluation_agent: 1. gradient_boosting: RΒ²=0.6660
INFO:agents.evaluation_agent: 2. linear_regression: RΒ²=0.6529
INFO:agents.evaluation_agent: 3. random_forest: RΒ²=0.6119
INFO:agents.critic_agent:Decision: ITERATE - r2 (0.666) below threshold (0.75)
Generated Report (reports/generated/report_TIMESTAMP.md):
# Autonomous Data Science Report
## π― Objective
Predict house prices and explain value drivers
## π Dataset Summary
- Source: `../Housing.csv`
- Rows: 545
- Columns: 13
- Target Variable: price
- Task Type: Regression
## π§ Preprocessing Pipeline
1. Drop High Missing Cols
2. Impute Numeric Median
3. Encode Categorical Onehot
## π Best Model
**Selected Model:** Gradient Boosting
### Performance Metrics
- RMSE: 1299385.98
- MAE: 959748.96
- RΒ²: 0.6660
## π§ Feature Importance
Top 10 Most Important Features:
1. **area**: 0.4521
2. **bedrooms**: 0.1823
3. **bathrooms**: 0.1456
4. **stories**: 0.0892
5. **mainroad_yes**: 0.0543
## π‘ Key Insights
1. The gradient_boosting model achieved 0.666 RΒ² score
2. Area is the strongest predictor of house prices
3. Model performance suggests room for improvement
4. Additional feature engineering may improve results
5. Results should be validated on new data
## π¬ Conclusion
The autonomous agent completed 3 iteration(s) and selected **gradient_boosting** as the best performing model.Add new preprocessing options in data_agent.py:
elif step == "remove_outliers":
# Your custom outlier removal logic
passAdd models to modeling_agent.py:
elif s_lower == "xgboost":
from xgboost import XGBRegressor
model = XGBRegressor(n_estimators=200, random_state=42)Modify thresholds in config.yaml:
max_iterations: 5 # Allow more iterations
performance_threshold: 0.80 # Higher bar for satisfactionPerfect for:
- Master's thesis in AI/ML Engineering
- PFE (Projet de Fin d'Γtudes) requiring production systems
- Research on autonomous agent systems
- Portfolio projects for Data Science/ML Engineer roles
Key Differentiators:
- Multi-agent architecture (not single LLM chain)
- Self-critique loop with iterative improvement
- Production-ready code structure with proper state management
- Comprehensive logging and reporting
- Uses local LLM (Ollama) - no API costs
Technical Highlights:
- LangGraph for agent orchestration
- TypedDict for type-safe state management
- scikit-learn for ML pipeline
- Autonomous decision-making via LLM reasoning
- Orchestration: LangGraph
- LLM: Ollama + Llama 3.2
- ML: scikit-learn, pandas, numpy
- Data Processing: pandas, numpy
- Logging: Python logging module
Issue: KeyError: 'processed_data_path'
- Solution: Ensure
states.pyincludes all required fields inAgentStateTypedDict
Issue: Unicode encoding error in report
- Solution: Fixed - reports now use UTF-8 encoding
Issue: Ollama connection refused
- Solution: Run
ollama servein a separate terminal
Issue: LangChain deprecation warnings
- These are warnings only and don't affect functionality
- Upgrade to
langchain-ollamaif preferred
pandas>=2.0.0
numpy>=1.24.0
scikit-learn>=1.3.0
langchain>=0.1.0
langchain-community>=0.0.20
langgraph>=0.0.26
pyyaml>=6.0
joblib>=1.3.0Contributions welcome! Areas for improvement:
- Additional agents (AutoML, Feature Engineering Agent)
- Support for more model types (deep learning, time series)
- Enhanced explainability (SHAP, LIME)
- Web interface for interaction
- MLflow integration for experiment tracking
MIT License
Built with autonomy in mind. No human intervention required. π