This project is a full-stack web application designed for Visual Question Answering (VQA) focused on environmental assessment, specifically air quality prediction and recommendation generation based on sky images.
It leverages a modern microservice architecture, combining a powerful language model (LLM) for dialogue and prediction logic for analysis.
- Multimodal VQA: Upload an image (e.g., a photo of the sky) and ask natural language questions about the air quality, conditions, or related health advice.
- Retrieval-Augmented Generation (RAG): Integrates an LLM with a knowledge base (vector store) to provide accurate, grounded, and context-specific recommendations (e.g., health or improvement advice).
- Containerized Deployment: The entire application is packaged using Docker and orchestrated with Docker Compose for easy, consistent deployment across different environments (including local development and GCP).
- User-Friendly UI: Provides an interactive web interface built with Vue.js for seamless image input and result display.
You can see a demo of the system here: https://youtu.be/FCwNgOD7UxE
The application employs a standard microservice architecture separated into three main containers, managed by Nginx for routing:
- Frontend: Vue.js application follows a modular Vue 3 + Vite architecture, using TypeScript for robustness and BootstrapVue for styling. It acts as a lightweight SPA that communicates with the Python backend via Axios, offering users an interactive and responsive web experience.
- Backend (API): Python application (likely using FastAPI) that handles API requests, model inference, and RAG coordination.
- Ollama (LLM): Containerized environment hosting the local Large Language Model for chat and RAG execution.
| Specification | Detail |
|---|---|
| Machine Type | n1-standard-8 |
| vCPUs | 8 |
| RAM | 30 GiB |
| GPU | NVIDIA T4 (1 unit) |
| Operating System | Ubuntu 22.04 LTS |
The following dependencies are required on the host system:
- Git LFS
- Docker
- Docker Compose
- CUDA Driver and cuDNN (Required for GPU usage with the T4)
For a hassle-free installation on a Ubuntu GCP instance, run the setup script:
bash GCP-install-dependencies.sh| Component | Technology | Role |
|---|---|---|
| Backend/API | Python (via main.py, rag.py) |
Handles basic logic, prediction inference (sky_test_v1.py, model_v3.py), and serves the RAG/VQA endpoints. |
| Frontend/UI | Vue.js / Vite | Provides the interactive web interface and guides. |
| LLM Engine | Ollama | Manages and serves the local Large Language Model used for RAG responses. |
| Vector Store | ChromaDB (chroma.sqlite3) |
Stores domain-specific knowledge embeddings used by the RAG pipeline. |
This project requires external machine learning model files (.h5) to run the prediction logic in the backend. These files are managed using Git LFS (Large File Storage).
-
Ensure
gitandgit-lfsare installed. If you don’t have Git LFS installed, initialize it:git lfs install
-
Navigate to the
backenddirectory.cd backend -
Clone the
VQAmodelsrepository from Hugging Face into a temporarymodelsfolder:git clone https://huggingface.co/930727fre/VQAmodels models
-
Move the
.h5files to the mainbackenddirectory:mv models/*.h5 .
-
Remove the empty
modelsdirectory once files are moved:sudo rm -drf models
Return to the project root:
cd ..
The RAG pipeline requires an LLM. This project uses Ollama to manage the model.
- Navigate to the Ollama directory:
cd backend/ollama - Run the script to pull the necessary model (as configured):
Return to the project root:
bash pull-model.sh
cd ../..
Ensure you are on the main branch, all dependencies are installed, and models are set up.
-
Navigate to the project directory:
cd VQAweb -
Configure Frontend IP (Important) You must modify the frontend code to point to your server's IP address. In the file
frontend/Present/src/components/PictureInput.vue, replacelocalhostin theaxios.postline with your<server_IP>. -
Run the entire system using the script: The following script builds and runs the Docker containers defined in
docker-compose.yml../docker_run.sh
If you encounter execution issues, make the script executable first:
chmod +x ./docker_run.sh ./docker_run.sh
-
Access the application: Visit the following address in your web browser:
http://<server_IP>:8000(Note: The default port is set to 8000 in this deployment configuration.)
-
To stop the application: Press
Ctrl + Cin the terminal where the script is running.Note on Cleanup: If Docker images are not successfully deleted after stopping the containers, you may need to manually inspect and modify the
docker rmicommand within the./docker_run.shscript to ensure proper cleanup.
.
├── GCP-install-dependencies.sh # Script for GCP setup
├── Ubuntu-install-docker.sh # Script for Docker installation on Ubuntu
├── air-predict.ipynb # Jupyter notebook for initial prediction model development
├── backend # Python API, Models, and RAG logic
│ ├── Dockerfile # Defines the backend container image
│ ├── main.py # Backend API entry point (FastAPI)
│ ├── rag.py # RAG implementation logic
│ ├── ollama # Ollama container setup and model pulling
│ └── vectorstore_db # ChromaDB files and knowledge base
├── docker-compose.yml # Defines multi-container application services
├── frontend # Vue.js application source
│ └── Present # The main frontend component (Vue/Vite)
├── images # System architecture diagrams
└── nginx # Nginx configuration and Docker setup
