Data Analyst Agent

AI agent that answers questions about your data in plain English or Vietnamese — no SQL knowledge needed.

What it does

Ask questions in natural language → agent writes and runs SQL → returns the answer.

You:   Sản phẩm nào bán chạy nhất?
Agent: Phone — 75 units sold across all cities.

You:   Total revenue by city?
Agent: Danang 2.4B  |  Hanoi 3.0B  |  HCMC 5.7B

If AWS credentials are set (see Configuration), the agent automatically switches to the SmartCity pipeline dataset (vehicle, weather, emergency data on S3) instead of the local sales table — same interface, no code change needed:

You:   Tốc độ trung bình của xe điện hôm nay là bao nhiêu?
Agent: 78.3 km/h trung bình cho các chuyến xe điện hôm nay (date=2026-06-17).

Supports English and Vietnamese out of the box.

Stack

Layer	Tool
LLM	Groq — Llama 3.3 70B
Agent framework	LangChain
Database	DuckDB (local file or S3 via httpfs)
CLI	Rich
Retry / resilience	Tenacity

Project structure

data-analyst-agent/
├── src/
│   ├── agent/
│   │   └── agent.py          # LangChain agentic loop + Groq integration
│   ├── tools/
│   │   ├── sql_tool.py       # query_sql and list_tables tools (local DuckDB + S3/httpfs)
│   │   └── file_tool.py      # extensible file ingestion (WIP)
│   └── logging_config.py     # JSON structured logger
├── tests/
│   ├── conftest.py            # stubs for all external deps (no live services needed)
│   ├── test_agent.py          # 43 cases — agent loop, retry, fallback XML
│   ├── test_sql_tool.py       # 40 cases — sanitize, S3/local conn lifecycle, overflow rewrite
│   └── test_logging_config.py # 26 cases — JsonFormatter, extra fields, LOG_LEVEL
├── data/sample/
│   └── warehouse.db           # auto-created on first run (local mode only)
├── main.py                    # entry point
├── pyproject.toml
└── .env.example

Quick start

1. Clone and install

git clone https://github.com/minnobug/data-analyst-agent.git
cd data-analyst-agent
pip install -e .

2. Set up environment

cp .env.example .env
# Add your Groq API key — free at console.groq.com

GROQ_API_KEY=your_key_here
GROQ_MODEL=llama-3.3-70b-versatile
LOG_LEVEL=INFO

Leave AWS_ACCESS_KEY / AWS_SECRET_KEY / AWS_BUCKET_NAME unset to run in local mode (sample sales data, auto-seeded). Fill them in to point the agent at the SmartCity S3 pipeline instead — see Configuration.

3. Run

python main.py

Run tests

# All 109 tests — no live services or API keys needed
pytest tests/ -v

# With coverage
pytest tests/ --cov=src --cov-report=term-missing

Configuration

Env var	Default	Description
`GROQ_API_KEY`	required	Groq API key
`GROQ_MODEL`	`llama-3.3-70b-versatile`	Model to use
`WAREHOUSE_DB`	`data/sample/warehouse.db`	Path to local DuckDB file (used when AWS vars below are not set)
`LOG_LEVEL`	`INFO`	`DEBUG` / `INFO` / `WARNING`
`AWS_ACCESS_KEY`	—	Enables SmartCity/S3 mode when set together with the two vars below
`AWS_SECRET_KEY`	—	AWS secret key for S3 access
`AWS_BUCKET_NAME`	—	S3 bucket holding the SmartCity pipeline's `refined/` Parquet data
`AWS_REGION`	`ap-southeast-1`	AWS region for the bucket

When all three AWS vars are present, the agent connects via DuckDB's httpfs extension to the SmartCity tables (vehicle_data, gps_data, traffic_data, weather_data, emergency_data) and falls back to the local warehouse automatically if the S3 data is missing or unreachable.

Roadmap

Natural language → SQL (English + Vietnamese)
DuckDB local warehouse
Connect to SmartCity pipeline (S3 Parquet via DuckDB httpfs)
Groq rate-limit retry with tenacity
JSON structured logging
109 unit tests, CI on GitHub Actions
File ingestion tool (CSV, Parquet upload)
Streamlit web UI

License

MIT — see LICENSE

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
.github/workflows		.github/workflows
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Analyst Agent

What it does

Stack

Project structure

Quick start

Run tests

Configuration

Roadmap

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Data Analyst Agent

What it does

Stack

Project structure

Quick start

Run tests

Configuration

Roadmap

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages