A comprehensive platform for discovering, processing, and analyzing ARGO ocean float data. This project combines a high-performance Indian Ocean data system, an interactive research dashboard, and FloatChat—an AI-powered conversational interface for natural language data querying.
The ARGO Ocean Data Discovery & Visualization platform is designed to make vast amounts of oceanographic data accessible and actionable. It specifically targets the Indian Ocean region, providing specialized tools to filter, store, and visualize float data with sub-second latency.
flowchart LR
subgraph Client
Dash[Dash Frontend]
end
subgraph Backend
API[FastAPI RAG Service] -- & SQL --> DB[(SQLite/PG)]
API -- Embeddings --> Chroma[(Chroma Vector DB)]
end
Dash -- HTTP --> API
Dash -- Direct Read --> DB
- Data Ingestion: Processes raw global ARGO CSV data.
- Geographic Filtering: Automatically extracts floats within the Indian Ocean boundaries (20°E-120°E, 40°S-25°N).
- Intelligent Processing: Handles missing data with realistic filler generation based on regional parameters.
- Storage: Stores processed profiles in an optimized SQLite database for fast access.
- Visualization:
- Dashboard: Interactive maps and profile viewers options.
- RAG AI: Natural language queries converted to SQL or visualizations.
ARGO Ocean Data Discovery & Visualization/
│
├── backend/ # FastAPI RAG backend
│ └── rag_service.py # Main API service
│
├── dash_frontend/ # Frontend applications
│ ├── research_dashboard.py # Main interactive dashboard
│ ├── simple_app.py # Legacy/Simple interface
│ └── assets/ # CSS and static assets
│
├── data/ # Data storage
│ └── indian_ocean_floats.db # Optimized SQLite database
│
├── src/ # Core Python modules
│ ├── data_processor.py # Core processing & filtering logic
│ ├── dashboard_data_integration.py # Bridge between DB and Dashboard
│ └── argo_float_rag.py # RAG/LLM implementation
│
├── scripts/ # Utility and setup scripts
│ ├── setup_indian_ocean_data.py # Database initialization script
│ └── ingest.py # Data ingestion utilities
│
├── output/ # Generated assets (plots, reports)
├── docs/ # Documentation
├── requirements.txt # Project dependencies
├── Dockerfile # Container configuration
└── README.md # This file
### Core Components
#### Data System (`src/data_processor.py`)
- **`IndianOceanArgoProcessor`**: Filters global data and creates the regional database.
- **`IndianOceanDataAccess`**: Provides optimized query methods (spatial, temporal, parametric).
#### Dashboard (`dash_frontend/`)
- **`research_dashboard.py`**: The primary UI for researchers, featuring map views, profile analysis, and regional filtering.
#### AI Assistant (`backend/` & `src/argo_float_rag.py`)
- **FloatChat**: A RAG-based assistant that allows users to ask questions like "Show me floats near Madagascar" and receive generated SQL or plots.
---
## ✨ Features
- ✅ **Indian Ocean Focus**: Specialized filtering for the Indian Ocean region.
- ✅ **High-Performance DB**: SQLite implementation with spatial indexing for sub-second queries.
- ✅ **Interactive Map**: Real-time float tracking with `scattergeo`/`mapbox`.
- ✅ **Profile Viewer**: Detail views for Temperature, Salinity, and Pressure profiles.
- ✅ **AI-Powered Chat**: Natural Language to SQL/Visualization using RAG.
- ✅ **Data Quality Control**: Automated validation and realistic filler for missing values.
- ✅ **Responsive Design**: Modern dashboard UI built with Dash.
- ✅ **Export Capabilities**: Export data to CSV, NetCDF, or GeoJSON.
---
## 🛠 Requirements
### Python Version
- Python 3.11 or higher
### Key Libraries
```bash
dash # Web framework
plotly # Interactive plotting
pandas # Data manipulation
fastapi # Backend API
chromadb # Vector database for RAG
sentence-transformers # Text embeddings
geopandas # Spatial data handling
sqlalchemy # Database ORM
git clone <repository-url>
cd "ARGO Ocean Data Discovery & Visualization"python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activatepip install -r requirements.txtRun the setup script to process the data and create the local database.
python scripts/setup_indian_ocean_data.pyNote: This will look for ARGO CSV data. You can specify a custom path using --csv-path.
The main interface for data exploration.
python dash_frontend/research_dashboard.pyAccess at:
http://localhost:8050
To run the full RAG system:
Terminal 1 (Backend):
uvicorn backend.rag_service:app --reloadTerminal 2 (Frontend):
BACKEND_URL=http://localhost:8000 python dash_frontend/simple_app.pyThe system defaults to using a local SQLite database in data/indian_ocean_floats.db. You can customize this in dashboard_data_integration.py.
Create a .env file for advanced configuration:
DATABASE_URL=sqlite:///data/indian_ocean_floats.db
AUTH_TOKEN=your-dev-token
MAPBOX_TOKEN=your-mapbox-token-optionalThe core profiles table is optimized for scientific queries:
CREATE TABLE profiles (
id INTEGER PRIMARY KEY,
float_id TEXT,
datetime TEXT,
latitude REAL,
longitude REAL,
depth REAL,
temperature REAL,
salinity REAL,
-- ... indexes on lat/lon, time
);- Query Analysis: Determines if the user is asking for SQL data, general info, or a plot.
- Retrieval: Fetches relevant schema info and summary stats from ChromaDB.
- Generation: LLM constructs the SQL query or Plotly specification.
- Execution: Runs the query against the database and returns results.
Error: Database file not found at ...
Solution: Run python setup_indian_ocean_data.py to generate the database first.
If you encounter errors installing geopandas or shapely, ensure you have system libraries installed (e.g., libgeos-dev on Linux).
If port 8050 or 8000 is in use, you can specify a distinct port in the run commands:
python dash_frontend/research_dashboard.py --port 8051- Selectable date ranges for animations
- Integration of live ARGO API feed
- 3D visualization of ocean profiles
- Comparison tool for multi-float analysis
- Advanced anomaly detection models
- Argo Program: https://argo.ucsd.edu
- Dash Documentation: https://dash.plotly.com
- FastAPI: https://fastapi.tiangolo.com
Ankit Kumar Tiwari
