Skip to content

Gsoc mvp#3

Open
ad23b1012 wants to merge 7 commits intom2b3:mainfrom
ad23b1012:gsoc-mvp
Open

Gsoc mvp#3
ad23b1012 wants to merge 7 commits intom2b3:mainfrom
ad23b1012:gsoc-mvp

Conversation

@ad23b1012
Copy link
Copy Markdown

@ad23b1012 ad23b1012 commented Mar 25, 2026

Overview

This PR introduces the completed Phase 1 Minimum Viable Product (MVP) for the AStats project, designed for the UW-Madison GSoC 2026 proposal. It establishes a fully functional, end-to-end agentic AI framework capable of exploring datasets, generating statistical visualizations, executing Python code, and compiling professional analytical reports — validated against real-world scientific datasets.

🚀 What Was Accomplished (Phase 1)

We built the entire modular framework entirely from scratch, ensuring best practices in software architecture and AI integration:

1. Core Agentic Architecture

  • Plan-Execute-Reflect Loop: Implemented a robust BaseAgent class that autonomously guides the LLM through a structured reasoning cycle with iterative self-correction on execution errors.

  • Specialist Agents: Created domain-specific agents (EDAAgent, HypothesisAgent, RegressionAgent, TimeSeriesAgent) for targeted statistical methodologies.

  • Workflow Orchestrator: Developed an intelligent router that analyzes user queries to automatically select the best agent, with configurable autonomy levels (full-auto, semi-auto, step-by-step).

2. Multi-Provider LLM Abstraction

  • Built a flexible BaseLLMProvider abstraction layer to prevent vendor lock-in.

  • Integrated 4 providers:

    • Google Gemini 2.5 Flash — Primary free-tier default

    • Groq (Llama 3 70B / Mixtral) — Blazing-fast open-weight inference

    • Anthropic Claude 3.5 — Sonnet, Opus, and Haiku support

    • OpenAI / Codex — GPT-4o and legacy Codex support

  • All providers include streaming, structured function calling, and automatic retry with exponential backoff.

3. Data Engine & Tool Registry

  • Intelligent Tooling: Provided agents with sandboxed execution environments via the ToolRegistry (create_plot, describe_data, run_code, run_statistical_test, fit_model).

  • Auto-Profiler: Built a high-speed profiling engine to ingest datasets (CSV, JSON, Parquet, Excel, Stata, SPSS, Feather, etc.) and generate LLM-readable statistical summaries.

  • Smart Visualization Engine: Auto-selects distributions, correlation heatmaps, box plots, and categorical charts based on data profile.

4. Real-World Dataset Validation

Validated the full pipeline end-to-end on standard scientific datasets:

Dataset Rows Task What It Demonstrates
Fisher's Iris 150 Exploratory Data Analysis Auto-profiling, species grouping, correlation discovery
Diabetes 442 Regression Modeling OLS fitting, VIF multicollinearity check, Ridge/Lasso regularization
Titanic 100 Hypothesis Testing Chi-Square, independent t-test, Mann-Whitney U, One-Way ANOVA

Each dataset has a fully documented example script in examples/.

5. Statistical Methodology Documentation

  • Created METHODOLOGY.md documenting the rigorous statistical workflows: EDA (normality screening, outlier detection, bivariate associations), Hypothesis Testing (parametric & non-parametric with effect sizes), and Regression (OLS diagnostics, Durbin-Watson, Breusch-Pagan, VIF, regularization).

6. CLI, UX & Reporting

  • Interactive CLI: Built a Click-based CLI with subcommands (explore, analyze, profile, config).

  • HTML/Markdown Reports: Implemented automated, beautifully formatted HTML report generation using Jinja2, with Base64-embedded Seaborn/Matplotlib plots for zero-dependency sharing.

🔮 What's Next (Phase 2 & Beyond)

Now that the core framework is mathematically and architecturally sound, the next steps for the project entail:

  1. Local Open-Weight Integration: Wiring up local inference servers (e.g., Ollama or vLLM) so the analysis can be run entirely air-gapped without API keys.

  2. Fine-Tuning Pipelines: Researching and building dataset generation pipelines to fine-tune a smaller LLM specifically on data practitioner reasoning.

  3. Advanced Evaluators: Implementing "critic" agents to double-check statistical code and assumptions before execution.

  4. Workflow Templates: Developing reusable, recipe-based workflow templates for common statistical analyses.

🧪 Testing Notes

  • Tested end-to-end on sample_sales.csv, iris.csv, diabetes.csv, and titanic.csv.

  • All 3 example scripts (eda_iris_example.py, regression_diabetes_example.py, hypothesis_titanic_example.py) execute successfully and produce correct statistical outputs.

  • Verified environment variable injection (.env) for seamless API key configuration across all 4 providers.

  • All generated plots are verified to embed natively via Base64 in output HTML reports.

@Mustafa0216
Copy link
Copy Markdown

i have uploaded my work in my repo mentors please check my work there

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants