Skip to content
/ Helix Public

Helix is a hybrid predictive and driver intelligence engine that forecasts future business KPIs and explains why they will move. It combines statistical forecasting, machine-learning driver analysis, SHAP-based interpretability.

License

Notifications You must be signed in to change notification settings

rizzshi/Helix

Repository files navigation

🧬 Algorzen Helix

Algorzen Research Division Β· Project Drop 004
Predictive + Driver Intelligence Engine


πŸ“Έ Screenshots

Dashboard Overview

Feature Importance Analysis

Correlation Heatmap

PDF Report Sample


🎯 Mission

Algorzen Helix is a production-grade forecasting and driver analysis system that combines predictive modeling with automated feature importance analysis to deliver executive-ready insights.

What it does:

  • Ingests historical KPI data (CSV/API)
  • Produces short-to-medium-term forecasts with confidence intervals
  • Identifies key drivers using SHAP and feature importance
  • Generates branded PDF reports with GPT-4 narratives (optional)

⚑ Quickstart

1. Install

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

2. Run (No API Key Required)

python main.py --input data/sample_kpi_history.csv \
               --kpi revenue \
               --horizon 30 \
               --model baseline \
               --output reports/Helix_Forecast_Report_$(date +%Y%m%d).pdf

3. Check Output

  • PDF Report: reports/Helix_Forecast_Report_<date>.pdf
  • Metadata: reports/report_metadata.json
  • SHAP Plots: reports/assets/shap_bar.png (if available)

🧩 Architecture

algorzen-helix/
β”œβ”€β”€ main.py              # CLI orchestrator
β”œβ”€β”€ ingest.py            # Data loading, validation, resampling
β”œβ”€β”€ forecasting.py       # Baseline / GBM / Prophet models
β”œβ”€β”€ drivers.py           # Feature engineering + SHAP analysis
β”œβ”€β”€ ai_summary.py        # GPT-4 narrative (with fallback)
β”œβ”€β”€ report_generator.py  # PDF + metadata generator
β”œβ”€β”€ app/
β”‚   └── streamlit_app.py # Interactive web UI
β”œβ”€β”€ data/
β”‚   └── sample_kpi_history.csv  # 1200 days of synthetic data
β”œβ”€β”€ reports/             # Generated outputs
└── requirements.txt

πŸš€ Model Options

Model Description Use Case
baseline 7-day moving average Quick validation, stable trends
gbm Gradient Boosting with lag features Nonlinear patterns, multiple drivers
prophet Facebook Prophet (optional) Strong seasonality, holidays

πŸ“Š How to Interpret Reports

Executive Summary

High-level narrative explaining forecast direction and confidence. When --use-openai is enabled, this is generated by GPT-4; otherwise uses a template.

Forecast Chart

  • Blue line: Predicted values
  • Shaded area: Confidence interval (80-95%)
  • X-axis: Future time periods

Top Drivers

Ranked by feature importance (permutation-based). Higher values = stronger influence on KPI.

Model Performance

  • MAE (Mean Absolute Error): Average prediction error
  • RMSE (Root Mean Squared Error): Penalizes large errors more heavily

πŸ§ͺ Advanced Usage

CLI: Complete Command Reference

Basic Forecast Generation

# Generate 30-day forecast using baseline model
python main.py \
  --input data/sample_kpi_history.csv \
  --kpi revenue \
  --horizon 30 \
  --model baseline \
  --output reports/baseline_forecast.pdf

Advanced: GBM Model with Full Analysis

# Use Gradient Boosting with engineered features
python main.py \
  --input data/sample_kpi_history.csv \
  --kpi revenue \
  --horizon 90 \
  --model gbm \
  --output reports/gbm_90day_forecast.pdf

What you get:

  • 21 engineered features (14 lags, rolling averages, rate of change)
  • Top 15 driver analysis with permutation importance
  • Correlation heatmap showing feature relationships
  • Dataset statistics: growth %, mean, median, std dev
  • Model performance metrics: MAE, RMSE, MAPE, RΒ², directional accuracy

Enable GPT-4 Narrative Generation

# Set your OpenAI API key
export OPENAI_API_KEY="sk-your-key-here"

# Run with AI-powered insights
python main.py \
  --input data/sample_kpi_history.csv \
  --kpi revenue \
  --horizon 30 \
  --model gbm \
  --use-openai \
  --output reports/ai_enhanced_report.pdf

GPT-4 adds:

  • Executive-grade narrative summary
  • Contextualized driver explanations
  • Actionable recommendations based on forecast

Prophet Model (Seasonality Focus)

# Install Prophet first
pip install prophet

# Run with Prophet for strong seasonal patterns
python main.py \
  --input data/sample_kpi_history.csv \
  --kpi revenue \
  --horizon 60 \
  --model prophet \
  --output reports/prophet_seasonal_forecast.pdf

Best for:

  • Data with strong weekly/monthly patterns
  • Holiday effects
  • Multiple seasonality components

Interactive UI: Streamlit Dashboard

Launch the Dashboard

# Start the web interface
streamlit run app/streamlit_app.py

Access at http://localhost:8501

Dashboard Features

1. File Upload

  • Drag-and-drop CSV files
  • Automatic schema validation
  • Preview dataset before processing

2. Parameter Configuration

  • KPI Selection: Choose target column from dropdown
  • Forecast Horizon: 1-365 days via slider
  • Model Selection: Baseline, GBM, or Prophet
  • AI Toggle: Enable/disable GPT-4 narratives

3. Execution & Results

  • Real-time pipeline logs
  • Inline visualizations:
    • Feature importance bar chart
    • Correlation heatmap (top 15 features)
    • Model performance metrics
  • One-click PDF download

4. Analysis Display

  • Report Details: Model used, KPI, report ID, timestamp
  • Dataset Summary: Total days, growth %, top correlations
  • Driver Rankings: Top 15 features by importance score
  • Visual Analytics: Interactive charts with proper scaling

Working with Custom Datasets

Data Format Requirements

Your CSV must include:

date,kpi_name,feature1,feature2,...
2023-01-01,1000,500,250,...
2023-01-02,1050,520,260,...

Column specifications:

  • date: Timestamp column (any parseable format: YYYY-MM-DD, MM/DD/YYYY, etc.)
  • kpi_name: Your target metric (revenue, sales, conversions, etc.)
  • feature1, feature2, ...: Driver columns (numeric values)

Example: Sales Forecasting

# Forecast weekly sales with marketing data
python main.py \
  --input data/weekly_sales.csv \
  --kpi total_sales \
  --horizon 52 \
  --model gbm \
  --output reports/sales_forecast_1year.pdf

Example: Multi-KPI Analysis

# Generate separate reports for different metrics
for kpi in revenue profit_margin conversion_rate; do
  python main.py \
    --input data/business_metrics.csv \
    --kpi $kpi \
    --horizon 30 \
    --model gbm \
    --output "reports/${kpi}_forecast.pdf"
done

Understanding the Output

PDF Report Structure (8 Pages)

Page 1: Cover

  • Project title and branding
  • Author attribution
  • Generation timestamp

Page 2: Executive Summary

  • High-level findings
  • Forecast direction and magnitude
  • Key driver highlights
  • GPT-4 narrative (if enabled)

Page 3: Dataset Analysis ⭐ New

  • Total days analyzed
  • KPI statistics (mean, median, std dev, min, max)
  • Growth percentage over period
  • Feature correlations ranked by strength

Page 4: Forecast Visualization

  • 90-day historical context
  • 30-day prediction (or custom horizon)
  • Confidence intervals (shaded region)
  • Red line marking forecast start

Page 5: Model Performance Metrics

  • Bar chart visualization
  • MAE, RMSE, MAPE values
  • RΒ² score
  • Directional accuracy

Page 6: Feature Importance

  • Top 15 drivers ranked
  • Horizontal bar chart
  • Importance scores (0-1 scale)

Page 7: Correlation Heatmap

  • Top 15 features vs KPI
  • Color-coded: red (-1) to green (+1)
  • Shows engineered + raw features
  • Highlights strongest correlation

Page 8: Detailed Metrics Table

  • Complete performance breakdown
  • Model configuration details

Asset Files Generated

reports/
β”œβ”€β”€ Helix_Forecast_Report_20251116.pdf  # Main report
β”œβ”€β”€ report_metadata.json                 # Structured data
└── assets/
    β”œβ”€β”€ correlation_heatmap.png          # 15 features correlation matrix
    β”œβ”€β”€ feature_importance.png           # Top 15 drivers bar chart
    └── shap_bar.png                     # SHAP values (optional)

Performance Benchmarks

Sample Dataset (1200 days):

  • Processing time: ~3-5 seconds
  • GBM model achieves:
    • MAPE: 4.4% (excellent)
    • RΒ²: 0.52
    • MAE: 999.82
    • Directional accuracy: 41.4%

Feature Engineering:

  • Creates 21 features from 5 original columns
  • Top driver: rolling_7 (0.965 importance)
  • Strongest correlation: 0.979 (rolling_7 vs revenue)

πŸ“¦ Sample Dataset

The included data/sample_kpi_history.csv contains 1200 days of synthetic business data:

  • date: Daily timestamps (2022-01-01 to 2025-04-14)
  • revenue: Primary KPI with trend + seasonality
  • marketing_spend: Correlated driver
  • site_visits: Traffic metric
  • price: Product pricing
  • holiday_flag: Weekend/holiday indicator

πŸ› οΈ Tech Stack

  • Core: Python 3.10+, Pandas, NumPy, scikit-learn
  • Forecasting: Prophet (optional), statsmodels
  • Explainability: SHAP
  • Visualization: Matplotlib, Plotly
  • Reporting: ReportLab
  • AI: OpenAI GPT-4 (optional)
  • UI: Streamlit

πŸ“œ License

MIT License β€” See LICENSE file


πŸ‘€ Author

Algorzen Research Division Β© 2025
Author: Rishi Singh

Part of the Algorzen Intelligence Suite β€” providing decision-makers with foresight and explanation.


πŸ”— Related Projects

  • Algorzen Pulse: Real-time KPI monitoring and anomaly detection
  • Algorzen Vigil: Multi-source intelligence aggregation

About

Helix is a hybrid predictive and driver intelligence engine that forecasts future business KPIs and explains why they will move. It combines statistical forecasting, machine-learning driver analysis, SHAP-based interpretability.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published