Algorzen Research Division Β· Project Drop 004
Predictive + Driver Intelligence Engine
Algorzen Helix is a production-grade forecasting and driver analysis system that combines predictive modeling with automated feature importance analysis to deliver executive-ready insights.
What it does:
- Ingests historical KPI data (CSV/API)
- Produces short-to-medium-term forecasts with confidence intervals
- Identifies key drivers using SHAP and feature importance
- Generates branded PDF reports with GPT-4 narratives (optional)
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txtpython main.py --input data/sample_kpi_history.csv \
--kpi revenue \
--horizon 30 \
--model baseline \
--output reports/Helix_Forecast_Report_$(date +%Y%m%d).pdf- PDF Report:
reports/Helix_Forecast_Report_<date>.pdf - Metadata:
reports/report_metadata.json - SHAP Plots:
reports/assets/shap_bar.png(if available)
algorzen-helix/
βββ main.py # CLI orchestrator
βββ ingest.py # Data loading, validation, resampling
βββ forecasting.py # Baseline / GBM / Prophet models
βββ drivers.py # Feature engineering + SHAP analysis
βββ ai_summary.py # GPT-4 narrative (with fallback)
βββ report_generator.py # PDF + metadata generator
βββ app/
β βββ streamlit_app.py # Interactive web UI
βββ data/
β βββ sample_kpi_history.csv # 1200 days of synthetic data
βββ reports/ # Generated outputs
βββ requirements.txt
| Model | Description | Use Case |
|---|---|---|
baseline |
7-day moving average | Quick validation, stable trends |
gbm |
Gradient Boosting with lag features | Nonlinear patterns, multiple drivers |
prophet |
Facebook Prophet (optional) | Strong seasonality, holidays |
High-level narrative explaining forecast direction and confidence. When --use-openai is enabled, this is generated by GPT-4; otherwise uses a template.
- Blue line: Predicted values
- Shaded area: Confidence interval (80-95%)
- X-axis: Future time periods
Ranked by feature importance (permutation-based). Higher values = stronger influence on KPI.
- MAE (Mean Absolute Error): Average prediction error
- RMSE (Root Mean Squared Error): Penalizes large errors more heavily
# Generate 30-day forecast using baseline model
python main.py \
--input data/sample_kpi_history.csv \
--kpi revenue \
--horizon 30 \
--model baseline \
--output reports/baseline_forecast.pdf# Use Gradient Boosting with engineered features
python main.py \
--input data/sample_kpi_history.csv \
--kpi revenue \
--horizon 90 \
--model gbm \
--output reports/gbm_90day_forecast.pdfWhat you get:
- 21 engineered features (14 lags, rolling averages, rate of change)
- Top 15 driver analysis with permutation importance
- Correlation heatmap showing feature relationships
- Dataset statistics: growth %, mean, median, std dev
- Model performance metrics: MAE, RMSE, MAPE, RΒ², directional accuracy
# Set your OpenAI API key
export OPENAI_API_KEY="sk-your-key-here"
# Run with AI-powered insights
python main.py \
--input data/sample_kpi_history.csv \
--kpi revenue \
--horizon 30 \
--model gbm \
--use-openai \
--output reports/ai_enhanced_report.pdfGPT-4 adds:
- Executive-grade narrative summary
- Contextualized driver explanations
- Actionable recommendations based on forecast
# Install Prophet first
pip install prophet
# Run with Prophet for strong seasonal patterns
python main.py \
--input data/sample_kpi_history.csv \
--kpi revenue \
--horizon 60 \
--model prophet \
--output reports/prophet_seasonal_forecast.pdfBest for:
- Data with strong weekly/monthly patterns
- Holiday effects
- Multiple seasonality components
# Start the web interface
streamlit run app/streamlit_app.pyAccess at http://localhost:8501
1. File Upload
- Drag-and-drop CSV files
- Automatic schema validation
- Preview dataset before processing
2. Parameter Configuration
- KPI Selection: Choose target column from dropdown
- Forecast Horizon: 1-365 days via slider
- Model Selection: Baseline, GBM, or Prophet
- AI Toggle: Enable/disable GPT-4 narratives
3. Execution & Results
- Real-time pipeline logs
- Inline visualizations:
- Feature importance bar chart
- Correlation heatmap (top 15 features)
- Model performance metrics
- One-click PDF download
4. Analysis Display
- Report Details: Model used, KPI, report ID, timestamp
- Dataset Summary: Total days, growth %, top correlations
- Driver Rankings: Top 15 features by importance score
- Visual Analytics: Interactive charts with proper scaling
Your CSV must include:
date,kpi_name,feature1,feature2,...
2023-01-01,1000,500,250,...
2023-01-02,1050,520,260,...
Column specifications:
date: Timestamp column (any parseable format: YYYY-MM-DD, MM/DD/YYYY, etc.)kpi_name: Your target metric (revenue, sales, conversions, etc.)feature1, feature2, ...: Driver columns (numeric values)
# Forecast weekly sales with marketing data
python main.py \
--input data/weekly_sales.csv \
--kpi total_sales \
--horizon 52 \
--model gbm \
--output reports/sales_forecast_1year.pdf# Generate separate reports for different metrics
for kpi in revenue profit_margin conversion_rate; do
python main.py \
--input data/business_metrics.csv \
--kpi $kpi \
--horizon 30 \
--model gbm \
--output "reports/${kpi}_forecast.pdf"
donePage 1: Cover
- Project title and branding
- Author attribution
- Generation timestamp
Page 2: Executive Summary
- High-level findings
- Forecast direction and magnitude
- Key driver highlights
- GPT-4 narrative (if enabled)
Page 3: Dataset Analysis β New
- Total days analyzed
- KPI statistics (mean, median, std dev, min, max)
- Growth percentage over period
- Feature correlations ranked by strength
Page 4: Forecast Visualization
- 90-day historical context
- 30-day prediction (or custom horizon)
- Confidence intervals (shaded region)
- Red line marking forecast start
Page 5: Model Performance Metrics
- Bar chart visualization
- MAE, RMSE, MAPE values
- RΒ² score
- Directional accuracy
Page 6: Feature Importance
- Top 15 drivers ranked
- Horizontal bar chart
- Importance scores (0-1 scale)
Page 7: Correlation Heatmap
- Top 15 features vs KPI
- Color-coded: red (-1) to green (+1)
- Shows engineered + raw features
- Highlights strongest correlation
Page 8: Detailed Metrics Table
- Complete performance breakdown
- Model configuration details
reports/
βββ Helix_Forecast_Report_20251116.pdf # Main report
βββ report_metadata.json # Structured data
βββ assets/
βββ correlation_heatmap.png # 15 features correlation matrix
βββ feature_importance.png # Top 15 drivers bar chart
βββ shap_bar.png # SHAP values (optional)
Sample Dataset (1200 days):
- Processing time: ~3-5 seconds
- GBM model achieves:
- MAPE: 4.4% (excellent)
- RΒ²: 0.52
- MAE: 999.82
- Directional accuracy: 41.4%
Feature Engineering:
- Creates 21 features from 5 original columns
- Top driver:
rolling_7(0.965 importance) - Strongest correlation: 0.979 (rolling_7 vs revenue)
The included data/sample_kpi_history.csv contains 1200 days of synthetic business data:
date: Daily timestamps (2022-01-01 to 2025-04-14)revenue: Primary KPI with trend + seasonalitymarketing_spend: Correlated driversite_visits: Traffic metricprice: Product pricingholiday_flag: Weekend/holiday indicator
- Core: Python 3.10+, Pandas, NumPy, scikit-learn
- Forecasting: Prophet (optional), statsmodels
- Explainability: SHAP
- Visualization: Matplotlib, Plotly
- Reporting: ReportLab
- AI: OpenAI GPT-4 (optional)
- UI: Streamlit
MIT License β See LICENSE file
Algorzen Research Division Β© 2025
Author: Rishi Singh
Part of the Algorzen Intelligence Suite β providing decision-makers with foresight and explanation.
- Algorzen Pulse: Real-time KPI monitoring and anomaly detection
- Algorzen Vigil: Multi-source intelligence aggregation