Adaptive Retrieval-Augmented Generation (RAG) Chatbot

Live Demo: azureragchatbotapp.azurewebsites.net

(Status - Demo availability may depend on Azure hosting)

This repository contains a Streamlit-based application that uses Azure OpenAI and Azure Cognitive Search (or Chroma) to semantically ingest, index, and query text datasets (e.g., CSV files) stored in Azure Blob Storage, delivering context-rich, on-demand insights.

azure_rag_chatbot_demo.webm

Features:

Semantic Ingestion: Automatically load and deduplicate text entries from CSV files in Azure Blob Storage.
Dynamic Indexing: Monitor CSV additions, modifications, and deletions and sync changes to Azure Cognitive Search or a local Chroma DB.
RAG QA Pipeline: Use LangChain’s RetrievalQA with Azure OpenAI for context-aware question answering.
Interactive UI: User-friendly chat interface built with Streamlit.
Flexible Vector Store: Switch between Azure Cognitive Search and Chroma via a single environment variable.

Vector Store Options

azure: Uses Azure Cognitive Search with semantic search capabilities.
chroma: Uses a local Chroma vector database persisted under ./chroma_db.

Prerequisites & Installation:

Python 3.8 or higher
Azure account with:
- Blob Storage account and container for CSV files
- Azure OpenAI resource with deployed models for embeddings and chat
- Azure Cognitive Search service (if using vector_db_type=azure)
Azure CLI
(Optional) Docker for containerized deployment

Installation

git clone https://github.com/87tana/RAG_Chatbot.git

cd RAG_Chatbot

Install dependencies

pip install -r requirements.txt

Set up environment variables in a .env file:

AZURE_OPENAI_API_KEY=

AZURE_OPENAI_ENDPOINT=

AZURE_SEARCH_KEY=

AZURE_SEARCH_ENDPOINT=

AZURE_STORAGE_CONNECTION_STRING=

VECTOR_DB_TYPE=<azure|chroma>

Running the App :

Start the Streamlit application: "streamlit run app.py"

Usage

- Enter questions in the chat input box related to your ingested documents.

- The app maintains a conversation history during your session.

- If no CSV files are found, a dummy document is loaded to keep the chatbot functional.

Using Your Own Data:

To test the RAG Chatbot with your own documents (for example, research papers or any private data):
- Prepare Your Data: Export your document into one or more CSV files, each with a content column.
- Upload to Blob Storage: Place these CSV files into your configured Azure Blob Storage container. The app will automatically pick up new or updated files when you restart or invoke reindexing.
- Run the App: Start the Streamlit app (streamlit run app.py) and navigate to the UI. Your uploaded documents will be ingested and indexed on launch.
Local Testing Tip: Set VECTOR_DB_TYPE=chroma, place CSV files in a ./data folder, and modify ingestion code to read locally instead of from Blob Storage.

How It Works:

Blob Loading: app.py fetches all CSV blobs, extracts the content column, and deduplicates entries.
Index Sync: reindex_if_blob_changed checks for file changes, computes hashes, and updates the vector index accordingly.
Embedding: Text documents are converted to semantic vectors via Azure OpenAI Embeddings.
Retrieval: A retriever pulls top-k relevant documents for each query.
Generation: Azure Chat OpenAI LLM crafts a response based on retrieved context.
Display: Streamlit chat UI renders the conversation.

Error Handling:

Missing CSV Files: Loads a dummy document to ensure functionality.
Invalid CSV Format: Skips files without a content column and logs an error.
Connection Issues: Retries Azure API calls with exponential backoff.

Performance Considerations:

CSV Size: Indexing large CSV files (>1GB) may increase processing time. Split large files for better performance.
Indexing Time: Initial indexing depends on dataset size and Azure API latency (typically 1-5 seconds per 1000 text entries).
Scalability: Azure Cognitive Search is recommended for large datasets; Chroma suits smaller, local deployments.

Deployment:

Azure App Service: Containerize with Docker or use Python runtime.
Streamlit Cloud: Directly deploy your repo, set environment variables in the dashboard.

Docker:

See the Dockerfile for container setup and deployment instructions.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
asset		asset
chroma_db		chroma_db
data_preparation		data_preparation
evaluation		evaluation
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
app.py		app.py
file_state.json		file_state.json
indexing_utils.py		indexing_utils.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Adaptive Retrieval-Augmented Generation (RAG) Chatbot

Features:

Vector Store Options

Prerequisites & Installation:

Installation

Install dependencies

Set up environment variables in a .env file:

Running the App :

Usage

Using Your Own Data:

How It Works:

Error Handling:

Performance Considerations:

Deployment:

Docker:

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Adaptive Retrieval-Augmented Generation (RAG) Chatbot

Features:

Vector Store Options

Prerequisites & Installation:

Installation

Install dependencies

Set up environment variables in a .env file:

Running the App :

Usage

Using Your Own Data:

How It Works:

Error Handling:

Performance Considerations:

Deployment:

Docker:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages