Skip to content

DeemonDuck/mlops-batch-job

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MLOps Batch Job – Technical Assessment

Overview

This project implements a minimal MLOps-style batch pipeline in Python. It demonstrates reproducibility, observability, and deployment readiness using a deterministic signal generation workflow on OHLCV data.


Features

  • Config-driven execution (YAML-based)
  • Deterministic pipeline using fixed seed
  • Robust data validation (file, format, schema checks)
  • Rolling mean computation
  • Binary signal generation
  • Structured metrics output (JSON)
  • Detailed logging
  • Dockerized execution (one-command run)

Project Structure

.
├── run.py
├── config.yaml
├── data.csv
├── requirements.txt
├── Dockerfile
├── metrics.json
├── run.log
├── README.md

How It Works

  1. Load and validate configuration

  2. Load and validate dataset

  3. Compute rolling mean on close price

  4. Generate binary signal:

    • 1 if close > rolling_mean
    • 0 otherwise
  5. Compute metrics:

    • Total rows processed
    • Signal rate
    • Execution latency
  6. Save results to metrics.json

  7. Log all steps to run.log


Local Execution

Run the pipeline using:

python run.py --input data.csv --config config.yaml --output metrics.json --log-file run.log

Docker Execution

Build image:

docker build -t mlops-task .

Run container:

docker run --rm mlops-task

This will:

  • Execute the pipeline
  • Print metrics JSON to stdout
  • Generate logs and metrics inside the container

Example Output (metrics.json)

{
  "version": "v1",
  "rows_processed": 10000,
  "metric": "signal_rate",
  "value": 0.49,
  "latency_ms": 120,
  "seed": 42,
  "status": "success"
}

Error Handling

The pipeline gracefully handles:

  • Missing input file
  • Invalid CSV format
  • Empty dataset
  • Missing required column (close)
  • Invalid configuration

In case of failure, a structured error JSON is written:

{
  "version": "v1",
  "status": "error",
  "error_message": "Description of error"
}

Key Design Decisions

  • Reproducibility: Controlled via config + fixed seed
  • Determinism: No randomness in pipeline logic
  • Observability: Logging at each stage of execution
  • Robustness: Explicit validation and error handling
  • Portability: Docker ensures consistent execution environment

Requirements

  • Python 3.9+
  • Docker (for containerized execution)

Author

Developed as part of an MLOps technical assessment.

About

“Config-driven MLOps batch pipeline for deterministic signal generation with logging, metrics, and Dockerized execution.”

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors