Get up and running with the TuneLab in 5 minutes!
pip install -r requirements.txtpython example_usage.py --generate-dataThis creates sample datasets in sample_data/:
iris.csv- Classic iris dataset (multiclass classification)classification_example.csv- Binary classificationregression_example.csv- Regressionhousing.csv- Housing price prediction
# Option A: Run on Iris dataset (recommended for first try)
python ml_agent.py sample_data/iris.csv
# Option B: Run example script
python example_usage.py --example irisOpen the generated reports:
# View in your browser or text editor
outputs/reports/overview.md
outputs/reports/results.mdCheck the visualizations:
outputs/plots/feature_importance.png
outputs/plots/metric_comparison.png
import joblib
import pandas as pd
# Load the trained model
model_pkg = joblib.load('outputs/models/final_model.joblib')
# Extract components
model = model_pkg['model']
feature_names = model_pkg['feature_names']
# Load new data and predict
new_data = pd.read_csv('new_data.csv')
predictions = model.predict(new_data)
print(predictions)-
Dataset Fingerprinting (1 sec)
- Generates unique ID for your dataset
- Checks if strategy exists from previous runs
-
Data Analysis (2 sec)
- Auto-detects target and problem type
- Identifies feature types
- Checks for missing values
-
Feature Engineering (3 sec)
- Imputes missing values
- Encodes categorical features
- Splits train/test sets
-
Model Training (10-30 sec)
- Trains 4-5 baseline models
- Compares performance
- Selects best model
-
Hyperparameter Tuning (30-60 sec)
- Optimizes best model with Optuna
- Uses Bayesian optimization
- 30 trials with cross-validation
-
Artifact Generation (5 sec)
- Saves trained model
- Generates plots
- Creates Markdown reports
- Stores strategy for reuse
Total Time: ~1-2 minutes on CPU
ML Agent Initialized
Output Directory: outputs
Loading dataset: sample_data/iris.csv
Shape: (150, 5)
Dataset Fingerprint: a3f5d8c9b2e1f4a7
Analyzing dataset...
Auto-detected target: species
Auto-detected problem type: classification
Engineering features...
Train: 120 samples
Test: 30 samples
Training baseline models...
Best model: Random Forest (accuracy=0.9667)
Optimizing hyperparameters...
Optimized model score: 0.9733
Model saved: outputs/models/final_model.joblib
Generating plots...
Generating reports...
PIPELINE COMPLETE
Final model score: 0.9733
outputs/
├── models/
│ └── final_model.joblib
├── plots/
│ ├── feature_importance.png
│ └── metric_comparison.png
├── reports/
│ ├── overview.md
│ ├── data_analysis.md
│ ├── modeling.md
│ └── results.md
└── strategy/
└── a3f5d8c9b2e1f4a7.json
# If your target is not the last column
python ml_agent.py data.csv --target price --problem regressionOr in Python:
from ml_agent import MLAgent
agent = MLAgent(
data_path="data.csv",
target_col="price",
problem_type="regression"
)
agent.run()agent = MLAgent(
data_path="data.csv",
output_dir="my_project/results"
)
agent.run()agent = MLAgent(
data_path="data.csv",
max_iterations=5, # More tuning iterations
target_metric_threshold=0.98, # Stop if accuracy > 0.98
improvement_threshold=0.005 # Stop if improvement < 0.5%
)
agent.run()Solution:
pip install optunaOr ignore it - the agent will work without hyperparameter optimization.
Solution: Use absolute path or ensure file exists
python ml_agent.py /full/path/to/data.csvSolution: Reduce model complexity
Edit ml_agent.py and change:
'Random Forest': RandomForestClassifier(n_estimators=50) # Was 100python ml_agent.py path/to/your/data.csvOpen outputs/reports/results.md for:
- Model performance metrics
- Usage instructions
- Improvement suggestions
Edit ml_agent.py to add models like XGBoost or LightGBM
Use the saved model:
model_pkg = joblib.load('outputs/models/final_model.joblib')
# Deploy with Flask, FastAPI, etc.- Full Documentation: See
README.md - Code Comments: Read
ml_agent.py - Examples: Run
python example_usage.py --example all
That's it! You're ready to build production ML models autonomously! 🎉