Quick Start Guide

Get up and running with the TuneLab in 5 minutes!

Step 1: Install Dependencies

pip install -r requirements.txt

Step 2: Generate Sample Data

python example_usage.py --generate-data

This creates sample datasets in sample_data/:

iris.csv - Classic iris dataset (multiclass classification)
classification_example.csv - Binary classification
regression_example.csv - Regression
housing.csv - Housing price prediction

Step 3: Run Your First Model

# Option A: Run on Iris dataset (recommended for first try)
python ml_agent.py sample_data/iris.csv

# Option B: Run example script
python example_usage.py --example iris

Step 4: Check the Results

Open the generated reports:

# View in your browser or text editor
outputs/reports/overview.md
outputs/reports/results.md

Check the visualizations:

outputs/plots/feature_importance.png
outputs/plots/metric_comparison.png

Step 5: Use the Model

import joblib
import pandas as pd

# Load the trained model
model_pkg = joblib.load('outputs/models/final_model.joblib')

# Extract components
model = model_pkg['model']
feature_names = model_pkg['feature_names']

# Load new data and predict
new_data = pd.read_csv('new_data.csv')
predictions = model.predict(new_data)

print(predictions)

What Happens During a Run?

Dataset Fingerprinting (1 sec)
- Generates unique ID for your dataset
- Checks if strategy exists from previous runs
Data Analysis (2 sec)
- Auto-detects target and problem type
- Identifies feature types
- Checks for missing values
Feature Engineering (3 sec)
- Imputes missing values
- Encodes categorical features
- Splits train/test sets
Model Training (10-30 sec)
- Trains 4-5 baseline models
- Compares performance
- Selects best model
Hyperparameter Tuning (30-60 sec)
- Optimizes best model with Optuna
- Uses Bayesian optimization
- 30 trials with cross-validation
Artifact Generation (5 sec)
- Saves trained model
- Generates plots
- Creates Markdown reports
- Stores strategy for reuse

Total Time: ~1-2 minutes on CPU

Expected Output

Console Output

 ML Agent Initialized
 Output Directory: outputs

 Loading dataset: sample_data/iris.csv
   Shape: (150, 5)
 Dataset Fingerprint: a3f5d8c9b2e1f4a7

 Analyzing dataset...
   Auto-detected target: species
   Auto-detected problem type: classification

 Engineering features...
    Train: 120 samples
    Test: 30 samples

 Training baseline models...
    Best model: Random Forest (accuracy=0.9667)

  Optimizing hyperparameters...
    Optimized model score: 0.9733

 Model saved: outputs/models/final_model.joblib

 Generating plots...
 Generating reports...

 PIPELINE COMPLETE
 Final model score: 0.9733

Directory Structure

outputs/
├── models/
│   └── final_model.joblib
├── plots/
│   ├── feature_importance.png
│   └── metric_comparison.png
├── reports/
│   ├── overview.md
│   ├── data_analysis.md
│   ├── modeling.md
│   └── results.md
└── strategy/
    └── a3f5d8c9b2e1f4a7.json

Customization Examples

Example 1: Specify Target Column

# If your target is not the last column
python ml_agent.py data.csv --target price --problem regression

Or in Python:

from ml_agent import MLAgent

agent = MLAgent(
    data_path="data.csv",
    target_col="price",
    problem_type="regression"
)
agent.run()

Example 2: Change Output Directory

agent = MLAgent(
    data_path="data.csv",
    output_dir="my_project/results"
)
agent.run()

Example 3: Adjust Performance Thresholds

agent = MLAgent(
    data_path="data.csv",
    max_iterations=5,              # More tuning iterations
    target_metric_threshold=0.98,  # Stop if accuracy > 0.98
    improvement_threshold=0.005    # Stop if improvement < 0.5%
)
agent.run()

Common Issues

"ModuleNotFoundError: No module named 'optuna'"

Solution:

pip install optuna

Or ignore it - the agent will work without hyperparameter optimization.

"FileNotFoundError: data.csv not found"

Solution: Use absolute path or ensure file exists

python ml_agent.py /full/path/to/data.csv

Out of Memory Error

Solution: Reduce model complexity Edit ml_agent.py and change:

'Random Forest': RandomForestClassifier(n_estimators=50)  # Was 100

🎓 Next Steps

1. Try Your Own Data

python ml_agent.py path/to/your/data.csv

2. Read the Reports

Open outputs/reports/results.md for:

Model performance metrics
Usage instructions
Improvement suggestions

3. Experiment with Models

Edit ml_agent.py to add models like XGBoost or LightGBM

4. Deploy to Production

Use the saved model:

model_pkg = joblib.load('outputs/models/final_model.joblib')
# Deploy with Flask, FastAPI, etc.

Learn More

Full Documentation: See README.md
Code Comments: Read ml_agent.py
Examples: Run python example_usage.py --example all

That's it! You're ready to build production ML models autonomously! 🎉

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quick Start Guide

Step 1: Install Dependencies

Step 2: Generate Sample Data

Step 3: Run Your First Model

Step 4: Check the Results

Step 5: Use the Model

What Happens During a Run?

Expected Output

Console Output

Directory Structure

Customization Examples

Example 1: Specify Target Column

Example 2: Change Output Directory

Example 3: Adjust Performance Thresholds

Common Issues

"ModuleNotFoundError: No module named 'optuna'"

"FileNotFoundError: data.csv not found"

Out of Memory Error

🎓 Next Steps

1. Try Your Own Data

2. Read the Reports

3. Experiment with Models

4. Deploy to Production

Learn More

FilesExpand file tree

QUICKSTART.md

Latest commit

History

QUICKSTART.md

File metadata and controls

Quick Start Guide

Step 1: Install Dependencies

Step 2: Generate Sample Data

Step 3: Run Your First Model

Step 4: Check the Results

Step 5: Use the Model

What Happens During a Run?

Expected Output

Console Output

Directory Structure

Customization Examples

Example 1: Specify Target Column

Example 2: Change Output Directory

Example 3: Adjust Performance Thresholds

Common Issues

"ModuleNotFoundError: No module named 'optuna'"

"FileNotFoundError: data.csv not found"

Out of Memory Error

🎓 Next Steps

1. Try Your Own Data

2. Read the Reports

3. Experiment with Models

4. Deploy to Production

Learn More