Gsoc mvp by ad23b1012 · Pull Request #3 · m2b3/AStats

ad23b1012 · 2026-03-25T13:26:05Z

Overview

This PR introduces the completed Phase 1 Minimum Viable Product (MVP) for the AStats project, designed for the UW-Madison GSoC 2026 proposal. It establishes a fully functional, end-to-end agentic AI framework capable of exploring datasets, generating statistical visualizations, executing Python code, and compiling professional analytical reports — validated against real-world scientific datasets.

🚀 What Was Accomplished (Phase 1)

We built the entire modular framework entirely from scratch, ensuring best practices in software architecture and AI integration:

1. Core Agentic Architecture

Plan-Execute-Reflect Loop: Implemented a robust BaseAgent class that autonomously guides the LLM through a structured reasoning cycle with iterative self-correction on execution errors.
Specialist Agents: Created domain-specific agents (EDAAgent, HypothesisAgent, RegressionAgent, TimeSeriesAgent) for targeted statistical methodologies.
Workflow Orchestrator: Developed an intelligent router that analyzes user queries to automatically select the best agent, with configurable autonomy levels (full-auto, semi-auto, step-by-step).

2. Multi-Provider LLM Abstraction

Built a flexible BaseLLMProvider abstraction layer to prevent vendor lock-in.
Integrated 4 providers:
- Google Gemini 2.5 Flash — Primary free-tier default
- Groq (Llama 3 70B / Mixtral) — Blazing-fast open-weight inference
- Anthropic Claude 3.5 — Sonnet, Opus, and Haiku support
- OpenAI / Codex — GPT-4o and legacy Codex support
All providers include streaming, structured function calling, and automatic retry with exponential backoff.

3. Data Engine & Tool Registry

Intelligent Tooling: Provided agents with sandboxed execution environments via the ToolRegistry (create_plot, describe_data, run_code, run_statistical_test, fit_model).
Auto-Profiler: Built a high-speed profiling engine to ingest datasets (CSV, JSON, Parquet, Excel, Stata, SPSS, Feather, etc.) and generate LLM-readable statistical summaries.
Smart Visualization Engine: Auto-selects distributions, correlation heatmaps, box plots, and categorical charts based on data profile.

4. Real-World Dataset Validation

Validated the full pipeline end-to-end on standard scientific datasets:

Dataset	Rows	Task	What It Demonstrates
Fisher's Iris	150	Exploratory Data Analysis	Auto-profiling, species grouping, correlation discovery
Diabetes	442	Regression Modeling	OLS fitting, VIF multicollinearity check, Ridge/Lasso regularization
Titanic	100	Hypothesis Testing	Chi-Square, independent t-test, Mann-Whitney U, One-Way ANOVA

Each dataset has a fully documented example script in examples/.

5. Statistical Methodology Documentation

Created METHODOLOGY.md documenting the rigorous statistical workflows: EDA (normality screening, outlier detection, bivariate associations), Hypothesis Testing (parametric & non-parametric with effect sizes), and Regression (OLS diagnostics, Durbin-Watson, Breusch-Pagan, VIF, regularization).

6. CLI, UX & Reporting

Interactive CLI: Built a Click-based CLI with subcommands (explore, analyze, profile, config).
HTML/Markdown Reports: Implemented automated, beautifully formatted HTML report generation using Jinja2, with Base64-embedded Seaborn/Matplotlib plots for zero-dependency sharing.

🔮 What's Next (Phase 2 & Beyond)

Now that the core framework is mathematically and architecturally sound, the next steps for the project entail:

Local Open-Weight Integration: Wiring up local inference servers (e.g., Ollama or vLLM) so the analysis can be run entirely air-gapped without API keys.
Fine-Tuning Pipelines: Researching and building dataset generation pipelines to fine-tune a smaller LLM specifically on data practitioner reasoning.
Advanced Evaluators: Implementing "critic" agents to double-check statistical code and assumptions before execution.
Workflow Templates: Developing reusable, recipe-based workflow templates for common statistical analyses.

🧪 Testing Notes

Tested end-to-end on sample_sales.csv, iris.csv, diabetes.csv, and titanic.csv.
All 3 example scripts (eda_iris_example.py, regression_diabetes_example.py, hypothesis_titanic_example.py) execute successfully and produce correct statistical outputs.
Verified environment variable injection (.env) for seamless API key configuration across all 4 providers.
All generated plots are verified to embed natively via Base64 in output HTML reports.

Removed mention of '100% free-tier AI' from the project description.

…d methodology docs

…lutter

Mustafa0216 · 2026-03-31T17:46:48Z

i have uploaded my work in my repo mentors please check my work there

ad23b1012 and others added 7 commits March 25, 2026 18:08

feat: complete AStats MVP for GSoC submission

c5a52af

Update project description in README.md

57597fc

Removed mention of '100% free-tier AI' from the project description.

Update README.md

2d39916

feat: enhance framework with real datasets, Claude/OpenAI support, an…

978be09

…d methodology docs

chore: add script to generate full mock report for titanic dataset

18c8a6e

docs: clean up README to reflect actual project outcomes and remove c…

5d3c963

…lutter

Update README.md

bef78df

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gsoc mvp#3

Gsoc mvp#3
ad23b1012 wants to merge 7 commits intom2b3:mainfrom
ad23b1012:gsoc-mvp

ad23b1012 commented Mar 25, 2026 •

edited

Loading

Uh oh!

Mustafa0216 commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ad23b1012 commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

🚀 What Was Accomplished (Phase 1)

1. Core Agentic Architecture

2. Multi-Provider LLM Abstraction

3. Data Engine & Tool Registry

4. Real-World Dataset Validation

5. Statistical Methodology Documentation

6. CLI, UX & Reporting

🔮 What's Next (Phase 2 & Beyond)

🧪 Testing Notes

Uh oh!

Mustafa0216 commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ad23b1012 commented Mar 25, 2026 •

edited

Loading