🤖 Machine Learning Model

Agent-guided ML pipeline framework with PyQt6 GUI, experiment tracking, and production-ready model automation.

Overview
Key Features
Architecture
Usage Flow
Algorithm Coverage
Technology Stack
Setup & Installation
Usage
Core Capabilities
Experiment Tracking
Data Versioning
Roadmap
Development Status
Contributing
License

Overview

Machine Learning Model is a comprehensive, agent-driven ML framework that automates the full machine learning lifecycle — from raw data ingestion to production model deployment — through a 13-step guided pipeline. It targets data scientists, ML engineers, and developers who want structured, reproducible ML workflows without sacrificing flexibility.

The framework pairs a PyQt6 graphical interface with an intelligent ML agent that provides context-aware recommendations at every pipeline stage. Traditional algorithm exploration and AI-guided automation coexist in a single, unified environment.

Important

Agent Mode is the primary workflow entry point. It guides you step-by-step through the entire ML pipeline with automatic state persistence so you can pause and resume at any stage.

(back to top ↑)

Key Features

_Icon	_Feature	_Description	_Impact	_Status
_🤖	_{ML Agent}	_{AI-powered assistant navigating the 13-step pipeline}	_Critical	_{✅ Stable}
_🖥️	_{PyQt6 GUI}	_{Interactive workflow navigator with real-time progress}	_High	_{✅ Stable}
_💾	_{State Persistence}	_{Auto save/load of workflow progress across sessions}	_High	_{✅ Stable}
_📊	_{Enhanced Results}	_{Execution timing, hyperparameters, smart recommendations}	_High	_{✅ Stable}
_🧪	_{MLflow Tracking}	_{Experiment logging: params, metrics, feature importances}	_Medium	_{✅ Stable}
_🗃️	_{DVC Versioning}	_{Reproducible data & model pipelines via DVC}	_Medium	_{✅ Stable}
_🐳	_{Docker Support}	_{GUI-in-container with X11 forwarding and font rendering}	_Medium	_{✅ Stable}
_⚙️	_{Hyperparameter Tuning}	_{Automated optimization integrated into pipeline}	_High	_{🟡 Beta}
_📡	_{Drift Monitoring}	_{Continuous learning and model drift detection}	_Medium	_{🟡 Beta}

Highlights:

13-step automated pipeline: Data Collection → Preprocessing → EDA → Feature Engineering → Splitting → Algorithm Selection → Training → Evaluation → Tuning → Deployment → Monitoring → Experiment Tracking → Data Versioning
Rich algorithm output: every algorithm run returns execution time, full hyperparameter config, performance category (Excellent/Good/Fair/Poor), and actionable recommendations
Cross-platform: Linux, Windows, and basic macOS support with both shell and batch launchers

(back to top ↑)

Architecture

System Architecture

flowchart TD
    User([👤 User]) --> Entry{Entry Point}
    Entry -->|Agent Mode| Agent[🤖 ML Agent\nml_agent.py]
    Entry -->|GUI Mode| GUI[🖥️ PyQt6 GUI\nmain_window_pyqt6.py]
    Entry -->|CLI Mode| CLI[⌨️ CLI\ncli.py]

    Agent --> Workflow[📋 ML Workflow\nml_workflow.py]
    GUI --> Workflow
    Workflow --> Steps[🔧 Step Implementations\nstep_implementations.py]

    Steps --> DataLayer[📦 Data Layer]
    DataLayer --> Loader[Data Loader]
    DataLayer --> Validator[Data Validator]

    Steps --> Supervised[🌲 Supervised Algorithms]
    Supervised --> DT[Decision Tree]
    Supervised --> RF[Random Forest]
    Supervised --> SKLearn[scikit-learn Suite]

    Steps --> Eval[📈 Evaluation\nMetrics & Reports]
    Steps --> Track[🧪 MLflow Tracking]
    Steps --> DVC[🗃️ DVC Versioning]

    Eval --> Results[EnhancedResult\nTiming + Recommendations]
    Results --> Viz[📊 Visualization\nmatplotlib / plotly]

Component Responsibilities

_Component	_Location	_{Responsibility}
_{ML Agent}	_{workflow/ml_agent.py}	_{Orchestrates pipeline steps, provides context-aware recommendations}
_{ML Workflow}	_{workflow/ml_workflow.py}	_{State machine managing 13-step progression and persistence}
_{Step Implementations}	_{workflow/step_implementations.py}	_{Concrete logic for each pipeline stage}
_{PyQt6 GUI}	_{gui/main_window_pyqt6.py}	_{Interactive dashboard, progress tracking, real-time output}
_CLI	_cli.py	_{Typer-based command-line interface}
_Supervised	_supervised/	_{Decision Tree, Random Forest with enhanced result output}
_Tracking	_tracking/	_{MLflow integration for experiment logging}
_{Visualization}	_{visualization/}	_{matplotlib, seaborn, plotly chart generation}

Note

All pipeline state is automatically serialized to disk so sessions survive crashes or intentional exits. Resume by re-launching — the agent picks up where you left off.

(back to top ↑)

Usage Flow

End-to-End Interaction Sequence

sequenceDiagram
    participant Dev as 👤 Developer
    participant GUI as 🖥️ PyQt6 GUI
    participant Agent as 🤖 ML Agent
    participant Pipeline as 📋 Workflow
    participant MLflow as 🧪 MLflow
    participant DVC as 🗃️ DVC

    Dev->>GUI: Launch application
    GUI->>Agent: Initialize agent session
    Agent->>Pipeline: Load or create workflow state
    Pipeline-->>Agent: Current step (e.g. Step 1: Data Collection)
    Agent-->>GUI: Display step + recommendations

    Dev->>GUI: Load dataset
    GUI->>Pipeline: execute_step(data_collection)
    Pipeline->>DVC: Track raw data file
    DVC-->>Pipeline: ✅ data versioned
    Pipeline-->>GUI: Step complete → advance to Step 2

    Dev->>GUI: Run model training (Step 7)
    GUI->>Pipeline: execute_step(model_training)
    Pipeline->>MLflow: log_params(), log_metrics()
    MLflow-->>Pipeline: Run ID logged
    Pipeline-->>GUI: EnhancedResult{timing, metrics, recommendations}
    GUI-->>Dev: Display results + performance category

(back to top ↑)

Algorithm Coverage

Supported Algorithm Distribution

pie title Algorithm Coverage by Category
    "Supervised Classification" : 40
    "Supervised Regression" : 30
    "Ensemble Methods" : 20
    "Unsupervised (planned)" : 10

_Category	_Algorithms	_Status
_{Supervised Classification}	_{Decision Tree, Random Forest, SVM, KNN, Logistic Regression}	_{✅ Stable}
_{Supervised Regression}	_{Linear Regression, Decision Tree Regressor, Random Forest Regressor}	_{✅ Stable}
_{Ensemble Methods}	_{Random Forest, XGBoost, LightGBM}	_{✅ Stable}
_{Unsupervised Clustering}	_{K-Means, DBSCAN}	_{🟡 Planned}
_{Neural Networks}	_{scikit-learn MLPClassifier}	_{🟡 Planned}

(back to top ↑)

Technology Stack

_Technology	_Purpose	_{Why Chosen}	_{Alternatives Considered}
_{Python 3.8+}	_{Core runtime}	_{Ubiquitous ML ecosystem, broad OS support}	_{Julia, R}
_scikit-learn	_{ML algorithms}	_{Battle-tested, consistent API, rich estimator library}	_{PyTorch, TensorFlow}
_{XGBoost / LightGBM}	_{Gradient boosting}	_{State-of-the-art tabular performance}	_CatBoost
_PyQt6	_{Desktop GUI}	_{Native look/feel, rich widget set, Linux/Win/Mac}	_{Tkinter, Dear PyGui}
_MLflow	_{Experiment tracking}	_{Self-hostable, rich UI, scikit-learn autolog}	_{Weights & Biases, Neptune}
_DVC	_{Data versioning}	_{Git-native, storage-agnostic, pipeline support}	_{LakeFS, Pachyderm}
_Docker	_{Containerization}	_{Reproducible GUI environment, CI isolation}	_Podman
_pytest	_Testing	_{Fixture system, coverage plugins, hypothesis}	_unittest
_loguru	_Logging	_{Structured logs, rotation, zero-boilerplate}	_{standard logging}
_{Typer + Rich}	_CLI	_{Auto-help generation, colored output}	_{Click, argparse}

(back to top ↑)

Setup & Installation

Prerequisites

Python 3.8 – 3.12
Git
Docker (optional, for containerized GUI)
A display server (X11 or Wayland for GUI)

Clone & Install

git clone https://github.com/hkevin01/Machine-Learning-Model.git
cd Machine-Learning-Model

Linux / macOS:

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
# Dev + ML + viz extras
pip install -r requirements-dev.txt

Windows:

python -m venv venv
venv\Scripts\activate
pip install -r requirements.txt

Environment Variables

Copy the example environment file and configure as needed:

cp .env.example .env

# .env
MLFLOW_TRACKING_URI=http://localhost:5000
MLFLOW_EXPERIMENT_NAME=default

Verify Setup

python scripts/validate_setup.py

Tip

Run make mlflow-ui after installing dev dependencies to open the MLflow experiment dashboard at http://localhost:5000.

(back to top ↑)

Usage

Option 1 — Agent Mode (Recommended)

# Linux / macOS
./run_agent.sh

# Windows
run_agent.bat

The agent launches an interactive CLI + GUI session and guides you through all 13 pipeline steps.

Option 2 — PyQt6 GUI

# Unified launcher (Docker or local)
./run.sh               # Launch GUI in Docker
./run.sh --local       # Launch GUI natively
./run.sh --headless    # Headless import smoke-test
./run.sh --rebuild     # Force rebuild Docker image
./run.sh --healthcheck # Environment & ML diagnostics

Option 3 — CLI

python -m machine_learning_model --help

Option 4 — Python API

from machine_learning_model.workflow.ml_agent import MLAgent

agent = MLAgent()
agent.run()  # Starts the guided 13-step pipeline

Enhanced Algorithm Output:

from machine_learning_model.supervised.random_forest import run_algorithm

result = run_algorithm("Random Forest", "classification", spec)
print(f"Execution Time : {result.execution_time:.4f}s")
print(f"Performance    : {result.performance_summary}")   # "Accuracy: 0.934 (Excellent)"
print(f"Recommendations: {result.recommendations}")       # ["Try cross-validation", ...]

(back to top ↑)

Core Capabilities

🤖 Agent Mode Pipeline

The ML Agent executes a deterministic 13-step workflow. Each step is independently resumable:

_#	_Step	_Description
₁	_{Data Collection}	_{Automated dataset loading and schema validation}
₂	_{Data Preprocessing}	_{Cleaning, null handling, encoding, type coercion}
₃	_{Exploratory Data Analysis}	_{Automated statistical summary and distribution plots}
₄	_{Feature Engineering}	_{Scaling, polynomial features, selection}
₅	_{Data Splitting}	_{Stratified train / validation / test splitting}
₆	_{Algorithm Selection}	_{Automatic algorithm recommendation based on data profile}
₇	_{Model Training}	_{Multi-algorithm training with MLflow logging}
₈	_{Model Evaluation}	_{Accuracy, F1, ROC-AUC, R², MSE with visual reports}
₉	_{Hyperparameter Tuning}	_{Grid/random search with cross-validation}
₁₀	_{Model Deployment}	_{Pickle + ONNX export, production-ready persistence}
₁₁	_Monitoring	_{Drift detection and continuous learning hooks}
₁₂	_{Experiment Tracking}	_{MLflow run comparison and artifact logging}
₁₃	_{Data Versioning}	_{DVC pipeline for fully reproducible data & model history}

📊 Enhanced Algorithm Results

Every algorithm execution returns an EnhancedResult object:

@dataclass
class EnhancedResult:
    execution_time: float          # Precise wall-clock timing
    model_params: dict             # Full hyperparameter configuration
    performance_summary: str       # "Accuracy: 0.934 (Excellent)"
    recommendations: list[str]     # Context-aware next-step suggestions
    extended_metrics: dict         # AUC, F1-macro, confusion matrix, etc.
    model_insights: dict           # Algorithm-specific info (feature importances, etc.)

Warning

Performance categories (Excellent/Good/Fair/Poor) are heuristic thresholds. Always validate against your domain's acceptable error bounds before deployment.

🖥️ PyQt6 GUI Features

Real-time pipeline step progress tracker
Side-by-side algorithm comparison panel
Integrated log viewer with severity filtering
Decision boundary and feature importance charts
Keyboard shortcuts for power users:
- Press Ctrl+R to run the current pipeline step
- Press Ctrl+N to advance to the next step
- Press Ctrl+S to save workflow state

(back to top ↑)

Experiment Tracking

MLflow is integrated into Steps 7–9 of the pipeline. Enable it with:

pip install -r requirements-dev.txt
make mlflow-ui        # Opens http://localhost:5000

Configure in .env:

MLFLOW_TRACKING_URI=http://localhost:5000
MLFLOW_EXPERIMENT_NAME=default

When enabled, all built-in algorithms automatically log:

Hyperparameters (log_params)
Evaluation metrics (log_metrics)
Feature importances (log_artifact)
Trained model artifacts (mlflow.sklearn.log_model)

(back to top ↑)

Data Versioning

A minimal DVC pipeline is defined in dvc.yaml with two stages: prepare and train.

pip install -r requirements-dev.txt
make dvc-init
dvc repro              # Executes the full pipeline

Add a remote storage backend (optional):

dvc remote add -d origin <remote-url>   # S3, GCS, SSH, local path
dvc push

(back to top ↑)

Roadmap

gantt
    title Machine Learning Model — Roadmap
    dateFormat  YYYY-MM-DD
    section Foundation
        Core pipeline & agent mode    :done,    f1, 2025-01-01, 2025-06-01
        PyQt6 GUI                     :done,    f2, 2025-04-01, 2025-08-01
        MLflow + DVC integration      :done,    f3, 2025-06-01, 2025-10-01
    section Enhancement
        Enhanced algorithm results    :done,    e1, 2025-09-01, 2026-01-01
        Docker GUI support            :done,    e2, 2025-11-01, 2026-02-01
        Hyperparameter tuning engine  :active,  e3, 2026-01-01, 2026-05-01
    section Expansion
        Unsupervised algorithms       :         x1, 2026-05-01, 2026-08-01
        Neural network support        :         x2, 2026-06-01, 2026-10-01
        REST API / model serving      :         x3, 2026-08-01, 2026-12-01

_Phase	_Goals	_Target	_Status
_Foundation	_{Core pipeline, Agent Mode, PyQt6 GUI}	_{Q2 2025}	_{✅ Complete}
_Enhancement	_{Enhanced results, Docker, MLflow/DVC}	_{Q1 2026}	_{✅ Complete}
_Tuning	_{Hyperparameter engine, drift monitoring}	_{Q2 2026}	_{🟡 In Progress}
_Expansion	_{Unsupervised algorithms, neural nets}	_{Q3 2026}	_{⭕ Planned}
_Serving	_{REST API, model serving, cloud export}	_{Q4 2026}	_{⭕ Planned}

(back to top ↑)

Development Status

_Version	_Stability	_{Test Coverage}	_{Known Limitations}
_0.1.0	_Alpha	_Growing	_{macOS untested, neural nets planned}

Testing

# Run full test suite
python -m pytest tests/ -v

# With coverage report
python -m pytest tests/ --cov=src/machine_learning_model --cov-report=html

# Cross-platform compatibility
python -m pytest tests/test_platform_compatibility.py -v

# Linux / macOS convenience script
./scripts/run_comprehensive_tests.sh

Development Tools:

_Tool	_Purpose
_{pytest + pytest-cov}	_{Test runner and coverage}
_black	_{Code formatting}
_isort	_{Import ordering}
_flake8	_Linting
_mypy	_{Static type checking}
_ruff	_{Fast linting}
_pre-commit	_{Git hook automation}
_commitizen	_{Conventional commits}

(back to top ↑)

Platform Support

_Platform	_{Support Level}	_Notes
_{✅ Linux (Ubuntu 18.04+)}	_Full	_{Primary development target}
_{✅ Windows 10/11}	_Full	_{Batch scripts provided}
_{⚠️ macOS}	_Basic	_{Untested — use Linux scripts}

Contributing

Fork the repository
Create a feature branch: git checkout -b feat/my-feature
Commit using conventional commits: git commit -m "feat: add new algorithm"
Ensure tests pass: ./scripts/run_comprehensive_tests.sh
Open a Pull Request

📋 Detailed Contribution Guidelines

Code Style

Formatter: black — run black src/ tests/ before committing
Imports: isort — run isort src/ tests/
Linting: flake8 src/ tests/
Type hints: all public functions must have type annotations

Testing Requirements

New features require unit tests in tests/
Bug fixes require a regression test
Run pytest tests/ --cov=src/machine_learning_model and ensure coverage does not decrease

Branch Naming

_Type	_Pattern	_Example
_Feature	_feat/*	_{feat/add-kmeans}
_{Bug fix}	_fix/*	_{fix/workflow-resume}
_{Documentation}	_docs/*	_{docs/update-readme}
_Chore	_chore/*	_{chore/bump-deps}

Commit Format

Follow Conventional Commits:

feat(agent): add drift detection to monitoring step
fix(gui): resolve PyQt6 thread crash on large datasets
docs(readme): add mermaid architecture diagram

🐳 Docker Development Workflow

# Build GUI image
docker build -f Dockerfile.gui -t ml-model-gui .

# Run with X11 forwarding (Linux)
docker run -e DISPLAY=$DISPLAY \
           -v /tmp/.X11-unix:/tmp/.X11-unix \
           ml-model-gui

# Use docker-compose
docker-compose up

📦 Full Dependency List

Core (requirements.txt)

numpy, pandas — data manipulation
scikit-learn, xgboost, lightgbm — ML algorithms
matplotlib, seaborn, plotly — visualization
PyQt6 — desktop GUI
loguru — structured logging
python-dotenv — environment management
pydantic — data validation
typer, rich, click — CLI

Dev (requirements-dev.txt)

pytest, pytest-cov, hypothesis — testing
black, isort, flake8, ruff, mypy — code quality
pre-commit, commitizen — git automation
mlflow — experiment tracking
dvc — data versioning
mkdocs — documentation site

(back to top ↑)

License

This project is licensed under the MIT License — you are free to use, modify, and distribute it with attribution. See the LICENSE file for full terms.

Built with ❤️ by hkevin01

Report Bug · Request Feature

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

🤖 Machine Learning Model

Table of Contents

Overview

Key Features

Architecture

System Architecture

Component Responsibilities

Usage Flow

End-to-End Interaction Sequence

Algorithm Coverage

Supported Algorithm Distribution

Technology Stack

Setup & Installation

Prerequisites

Clone & Install

Environment Variables

Verify Setup

Usage

Option 1 — Agent Mode (Recommended)

Option 2 — PyQt6 GUI

Option 3 — CLI

Option 4 — Python API

Core Capabilities

🤖 Agent Mode Pipeline

📊 Enhanced Algorithm Results

🖥️ PyQt6 GUI Features

Experiment Tracking

Data Versioning

Roadmap

Development Status

Testing

Platform Support

Contributing

Code Style

Testing Requirements

Branch Naming

Commit Format

License