Skip to content

RobertB-38/eda-insight-engine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

EDA Insight Engine

AI-powered exploratory data analysis — upload any CSV, get a full statistical profile and ranked business insights in seconds.

Live Demo Python Streamlit Claude


What It Does

Drop in any CSV. In seconds you get:

  • Dataset overview — row/column counts, missing value %, duplicate detection, numeric column summary
  • Column profile — dtype, null %, unique count, mean, std, skewness, top values per column
  • Distribution charts — auto-generated histograms for every numeric feature
  • Missing value analysis — bar chart showing missing counts per column
  • Correlation matrix — full feature heatmap + top correlated pairs with scatter plots
  • AI business insights — 5 ranked insights with confidence scores, data quality flags, and recommended actions, generated by Claude Haiku 4.5
  • JSON report download — full analysis exportable as structured JSON

Screenshots

Dataset Overview + Column Profile (Light Mode)

Dataset Overview

Distribution Charts (Light Mode)

Distributions

Missing Values + Correlation Matrix (Dark Mode)

Correlation Matrix

Top Correlated Pairs + Generate Insights (Dark Mode)

Correlated Pairs

AI Insights — Data Quality Grade + Insight 01 (Dark Mode)

AI Insights

Insights 02 & 03 — Trend Analysis (Dark Mode)

Insights 02 03

Insights 04 & 05 + Download Report (Dark Mode)

Insights 04 05


Stack

Layer Technology
Frontend Streamlit
Data processing Pandas, NumPy
Visualisation Plotly Express, Plotly Graph Objects
AI layer Anthropic Claude Haiku 4.5
Deployment Streamlit Cloud
Language Python 3.11

Key Engineering Decisions

Hallucination guardrails — The prompt enforces strict rules: no causal claims, correlations below 0.4 flagged as weak signals, columns with >30% nulls marked unreliable and excluded from primary insights. Confidence scores reflect actual data quality, not AI confidence.

Input sanitisation — Column names are stripped and truncated before being passed to the API. File type validated by both extension and MIME type. Size capped at 10MB.

Secure API handling — API key loaded from st.secrets in production, never hardcoded or logged. Falls back to sidebar input for local use.

Session state management — Dataset and profile cached in st.session_state so re-running insights doesn't re-process the file. Insights cleared on new file upload.

Error handling — Auth failures, rate limits, timeouts, and connection errors caught separately with user-friendly messages. Malformed API responses handled gracefully.


Run Locally

git clone https://github.com/RobertB-38/eda-insight-engine.git
cd eda-insight-engine
pip install -r requirements.txt

Add your Anthropic API key:

mkdir .streamlit
echo 'ANTHROPIC_API_KEY = "sk-ant-api03-..."' > .streamlit/secrets.toml

Run:

streamlit run app.py

Project Structure

eda-insight-engine/
├── app.py              # Main application (1055 lines)
├── requirements.txt    # Dependencies
├── demo/               # Screenshots
└── .streamlit/
    └── secrets.toml    # API key (local only, gitignored)

Built By

Robert Borkar — MSc Data Analytics, Dublin City University
LinkedIn · GitHub

CI

About

AI-powered EDA tool that profiles CSV datasets and generates business insights using Claude Haiku 4.5

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors