Skip to content

Latest commit

 

History

History
269 lines (198 loc) · 5.22 KB

File metadata and controls

269 lines (198 loc) · 5.22 KB

Quick Start Guide

Get up and running with the TuneLab in 5 minutes!

Step 1: Install Dependencies

pip install -r requirements.txt

Step 2: Generate Sample Data

python example_usage.py --generate-data

This creates sample datasets in sample_data/:

  • iris.csv - Classic iris dataset (multiclass classification)
  • classification_example.csv - Binary classification
  • regression_example.csv - Regression
  • housing.csv - Housing price prediction

Step 3: Run Your First Model

# Option A: Run on Iris dataset (recommended for first try)
python ml_agent.py sample_data/iris.csv

# Option B: Run example script
python example_usage.py --example iris

Step 4: Check the Results

Open the generated reports:

# View in your browser or text editor
outputs/reports/overview.md
outputs/reports/results.md

Check the visualizations:

outputs/plots/feature_importance.png
outputs/plots/metric_comparison.png

Step 5: Use the Model

import joblib
import pandas as pd

# Load the trained model
model_pkg = joblib.load('outputs/models/final_model.joblib')

# Extract components
model = model_pkg['model']
feature_names = model_pkg['feature_names']

# Load new data and predict
new_data = pd.read_csv('new_data.csv')
predictions = model.predict(new_data)

print(predictions)

What Happens During a Run?

  1. Dataset Fingerprinting (1 sec)

    • Generates unique ID for your dataset
    • Checks if strategy exists from previous runs
  2. Data Analysis (2 sec)

    • Auto-detects target and problem type
    • Identifies feature types
    • Checks for missing values
  3. Feature Engineering (3 sec)

    • Imputes missing values
    • Encodes categorical features
    • Splits train/test sets
  4. Model Training (10-30 sec)

    • Trains 4-5 baseline models
    • Compares performance
    • Selects best model
  5. Hyperparameter Tuning (30-60 sec)

    • Optimizes best model with Optuna
    • Uses Bayesian optimization
    • 30 trials with cross-validation
  6. Artifact Generation (5 sec)

    • Saves trained model
    • Generates plots
    • Creates Markdown reports
    • Stores strategy for reuse

Total Time: ~1-2 minutes on CPU


Expected Output

Console Output

 ML Agent Initialized
 Output Directory: outputs

 Loading dataset: sample_data/iris.csv
   Shape: (150, 5)
 Dataset Fingerprint: a3f5d8c9b2e1f4a7

 Analyzing dataset...
   Auto-detected target: species
   Auto-detected problem type: classification

 Engineering features...
    Train: 120 samples
    Test: 30 samples

 Training baseline models...
    Best model: Random Forest (accuracy=0.9667)

  Optimizing hyperparameters...
    Optimized model score: 0.9733

 Model saved: outputs/models/final_model.joblib

 Generating plots...
 Generating reports...

 PIPELINE COMPLETE
 Final model score: 0.9733

Directory Structure

outputs/
├── models/
│   └── final_model.joblib
├── plots/
│   ├── feature_importance.png
│   └── metric_comparison.png
├── reports/
│   ├── overview.md
│   ├── data_analysis.md
│   ├── modeling.md
│   └── results.md
└── strategy/
    └── a3f5d8c9b2e1f4a7.json

Customization Examples

Example 1: Specify Target Column

# If your target is not the last column
python ml_agent.py data.csv --target price --problem regression

Or in Python:

from ml_agent import MLAgent

agent = MLAgent(
    data_path="data.csv",
    target_col="price",
    problem_type="regression"
)
agent.run()

Example 2: Change Output Directory

agent = MLAgent(
    data_path="data.csv",
    output_dir="my_project/results"
)
agent.run()

Example 3: Adjust Performance Thresholds

agent = MLAgent(
    data_path="data.csv",
    max_iterations=5,              # More tuning iterations
    target_metric_threshold=0.98,  # Stop if accuracy > 0.98
    improvement_threshold=0.005    # Stop if improvement < 0.5%
)
agent.run()

Common Issues

"ModuleNotFoundError: No module named 'optuna'"

Solution:

pip install optuna

Or ignore it - the agent will work without hyperparameter optimization.


"FileNotFoundError: data.csv not found"

Solution: Use absolute path or ensure file exists

python ml_agent.py /full/path/to/data.csv

Out of Memory Error

Solution: Reduce model complexity Edit ml_agent.py and change:

'Random Forest': RandomForestClassifier(n_estimators=50)  # Was 100

🎓 Next Steps

1. Try Your Own Data

python ml_agent.py path/to/your/data.csv

2. Read the Reports

Open outputs/reports/results.md for:

  • Model performance metrics
  • Usage instructions
  • Improvement suggestions

3. Experiment with Models

Edit ml_agent.py to add models like XGBoost or LightGBM

4. Deploy to Production

Use the saved model:

model_pkg = joblib.load('outputs/models/final_model.joblib')
# Deploy with Flask, FastAPI, etc.

Learn More

  • Full Documentation: See README.md
  • Code Comments: Read ml_agent.py
  • Examples: Run python example_usage.py --example all

That's it! You're ready to build production ML models autonomously! 🎉