This project implements a minimal MLOps-style batch pipeline in Python. It demonstrates reproducibility, observability, and deployment readiness using a deterministic signal generation workflow on OHLCV data.
- Config-driven execution (YAML-based)
- Deterministic pipeline using fixed seed
- Robust data validation (file, format, schema checks)
- Rolling mean computation
- Binary signal generation
- Structured metrics output (JSON)
- Detailed logging
- Dockerized execution (one-command run)
.
├── run.py
├── config.yaml
├── data.csv
├── requirements.txt
├── Dockerfile
├── metrics.json
├── run.log
├── README.md
-
Load and validate configuration
-
Load and validate dataset
-
Compute rolling mean on
closeprice -
Generate binary signal:
1ifclose > rolling_mean0otherwise
-
Compute metrics:
- Total rows processed
- Signal rate
- Execution latency
-
Save results to
metrics.json -
Log all steps to
run.log
Run the pipeline using:
python run.py --input data.csv --config config.yaml --output metrics.json --log-file run.log
docker build -t mlops-task .
docker run --rm mlops-task
This will:
- Execute the pipeline
- Print metrics JSON to stdout
- Generate logs and metrics inside the container
{
"version": "v1",
"rows_processed": 10000,
"metric": "signal_rate",
"value": 0.49,
"latency_ms": 120,
"seed": 42,
"status": "success"
}
The pipeline gracefully handles:
- Missing input file
- Invalid CSV format
- Empty dataset
- Missing required column (
close) - Invalid configuration
In case of failure, a structured error JSON is written:
{
"version": "v1",
"status": "error",
"error_message": "Description of error"
}
- Reproducibility: Controlled via config + fixed seed
- Determinism: No randomness in pipeline logic
- Observability: Logging at each stage of execution
- Robustness: Explicit validation and error handling
- Portability: Docker ensures consistent execution environment
- Python 3.9+
- Docker (for containerized execution)
Developed as part of an MLOps technical assessment.