AI-powered exploratory data analysis — upload any CSV, get a full statistical profile and ranked business insights in seconds.
Drop in any CSV. In seconds you get:
- Dataset overview — row/column counts, missing value %, duplicate detection, numeric column summary
- Column profile — dtype, null %, unique count, mean, std, skewness, top values per column
- Distribution charts — auto-generated histograms for every numeric feature
- Missing value analysis — bar chart showing missing counts per column
- Correlation matrix — full feature heatmap + top correlated pairs with scatter plots
- AI business insights — 5 ranked insights with confidence scores, data quality flags, and recommended actions, generated by Claude Haiku 4.5
- JSON report download — full analysis exportable as structured JSON
| Layer | Technology |
|---|---|
| Frontend | Streamlit |
| Data processing | Pandas, NumPy |
| Visualisation | Plotly Express, Plotly Graph Objects |
| AI layer | Anthropic Claude Haiku 4.5 |
| Deployment | Streamlit Cloud |
| Language | Python 3.11 |
Hallucination guardrails — The prompt enforces strict rules: no causal claims, correlations below 0.4 flagged as weak signals, columns with >30% nulls marked unreliable and excluded from primary insights. Confidence scores reflect actual data quality, not AI confidence.
Input sanitisation — Column names are stripped and truncated before being passed to the API. File type validated by both extension and MIME type. Size capped at 10MB.
Secure API handling — API key loaded from st.secrets in production, never hardcoded or logged. Falls back to sidebar input for local use.
Session state management — Dataset and profile cached in st.session_state so re-running insights doesn't re-process the file. Insights cleared on new file upload.
Error handling — Auth failures, rate limits, timeouts, and connection errors caught separately with user-friendly messages. Malformed API responses handled gracefully.
git clone https://github.com/RobertB-38/eda-insight-engine.git
cd eda-insight-engine
pip install -r requirements.txtAdd your Anthropic API key:
mkdir .streamlit
echo 'ANTHROPIC_API_KEY = "sk-ant-api03-..."' > .streamlit/secrets.tomlRun:
streamlit run app.pyeda-insight-engine/
├── app.py # Main application (1055 lines)
├── requirements.txt # Dependencies
├── demo/ # Screenshots
└── .streamlit/
└── secrets.toml # API key (local only, gitignored)
Robert Borkar — MSc Data Analytics, Dublin City University
LinkedIn · GitHub






