OSINT Military Bases Analyzer

A multi-agent OSINT pipeline that fetches satellite imagery of military bases, runs 8 vision-LLM analysts per base, synthesises their findings with a commander agent, generates cross-base patterns with a strategist agent, and visualises everything in a real-time Streamlit + Folium dashboard.

Built for From Idea to App (Reichman University, Semester 4, Assignment 2).

What it does

Given a CSV of military bases (id, country, name, latitude, longitude), the pipeline:

Fetches imagery for each base from Google Maps Static API (primary) and Sentinel Hub (Sentinel-2 true-colour, deep view).
Runs 8 analyst agents in a LangGraph loop. Each analyst inspects the current frame, extracts findings with per-finding confidence, and chooses an action (zoom-in, zoom-out, move-left, move-right, or finish). The next frame is re-fetched accordingly.
Calls a commander agent that consolidates the 8 analyst reports into a threat_assessment (HIGH / MEDIUM / LOW / UNKNOWN), high-consensus findings (3+ analysts agree, avg confidence ≥ 7), contested findings, and recommended next actions.
Calls a strategist agent at the end of the run to surface cross-base patterns, priority bases, and an overall assessment.
Persists everything to data.json (atomic writes, append-only run history) and renders it in the dashboard.

The dashboard launches the analyser as a subprocess, polls data.json every 2 s while a run is in progress, and includes an "Ask AI" tab that answers free-form questions about a selected base with the imagery + reports as context.

Architecture

                              military_bases.csv
                                       │
                                       ▼
                   ┌───────────────────────────────────────┐
                   │           base_analyzer.py            │
                   │  (orchestrator: per-base loop, run    │
                   │   versioning, commander, strategist)  │
                   └───────────────────────────────────────┘
                                       │
                                       ▼
                   ┌───────────────────────────────────────┐
                   │           pipeline.py                 │
                   │   LangGraph StateGraph                │
                   │                                       │
                   │   fetch_image ──► run_analyst ──►     │
                   │        ▲                │             │
                   │        │                ▼             │
                   │        └──── decide_next ──► END      │
                   └───────────────────────────────────────┘
                            │              │
                ┌───────────┘              └──────────────┐
                ▼                                         ▼
        ┌──────────────┐                          ┌──────────────┐
        │  imagery.py  │                          │ llm_client.py│
        │              │                          │              │
        │ Google Maps  │                          │ gemini /     │
        │ Sentinel Hub │                          │ openai /     │
        │ Moondream    │                          │ qwen factory │
        └──────────────┘                          └──────────────┘
                                       │
                                       ▼
                              ┌────────────────┐
                              │   storage.py   │
                              │ atomic JSON    │
                              │ run versioning │
                              └────────────────┘
                                       │
                                       ▼
                                  data.json
                                       │
                                       ▼
                              ┌────────────────┐
                              │     app.py     │
                              │ Streamlit +    │
                              │ Folium map +   │
                              │ Ask AI tab     │
                              └────────────────┘

Each analyst write is streamed via graph.stream() (not .invoke()) so the dashboard sees per-analyst progress in real time.

Tech stack

Layer	Library / service	Notes
Imagery — primary	Google Maps Static API	REST → JPEG; needs `GOOGLE_MAPS_KEY`
Imagery — deep view	`sentinelhub-py` 3.11.5	Sentinel-2 L2A via Copernicus Data Space
Imagery — caption pass	Moondream 3 cloud (`moondream`)	Optional fast pre-pass; needs `MOONDREAM_API_KEY`
Analyst LLM	`google.genai` (Gemini)	`gemini-3.1-pro-preview` with `response_schema` for structured output
Commander / Strategist	`openai` SDK	`gpt-5.5` via `beta.chat.completions.parse`
Free vision alt	OpenRouter (`qwen3-vl-32b`)	Drop-in via `PROVIDER="qwen"`
Multi-agent loop	`langgraph` 1.1.10	StateGraph; streamed
Dashboard	`streamlit` 1.57 + `streamlit-folium`	Folium MarkerCluster for overlapping bases
Validation	`pydantic` 2.x	Native structured outputs (no `instructor` — see Gotchas)

Project layout

.
├── base_analyzer.py     # main orchestrator — entry point
├── pipeline.py          # LangGraph nodes + commander/strategist callers
├── llm_client.py        # provider factory: gemini | openai | qwen
├── imagery.py           # Google Maps, Sentinel Hub, Moondream fetchers
├── storage.py           # atomic data.json reads/writes + run versioning
├── models.py            # AnalystReport, CommanderReport, StrategistReport
├── app.py               # Streamlit dashboard
├── tests/               # 13 unit tests (models, storage, consensus)
├── military_bases.csv   # input dataset
├── data.json            # output (gitignored, append-only run history)
├── screenshots/         # JPEGs per base (gitignored)
├── requirements.txt
├── .env.example         # template — copy to .env and fill in keys
└── CLAUDE.md            # project instructions for Claude Code

Setup

# 1. Clone and enter the project
cd OSINT-Military-Bases-Analyzer

# 2. Create a virtualenv (Python 3.12+ recommended)
python3 -m venv .venv
source .venv/bin/activate

# 3. Install dependencies
pip install -r requirements.txt

# 4. Configure secrets
cp .env.example .env
# Then edit .env and fill in your API keys (see below)

Required environment variables

GOOGLE_MAPS_KEY=             # required — primary imagery
GEMINI_API_KEY=              # required — analyst + commander vision LLM
OPENAI_API_KEY=              # required — commander + strategist (gpt-5.5)
OPENROUTER_API_KEY=          # optional — only if PROVIDER="qwen"
MOONDREAM_API_KEY=           # optional — caption pre-pass
SENTINELHUB_CLIENT_ID2=      # optional — deep Sentinel-2 imagery
SENTINELHUB_CLIENT_SECRET2=  # optional — paired with the above

Running the pipeline

# Full run (uses ROWS_TO_PROCESS in base_analyzer.py)
.venv/bin/python base_analyzer.py

# Fresh run — clear previous output first
rm data.json && .venv/bin/python base_analyzer.py

The pipeline is idempotent within a run: bases marked status="complete" in the latest run are skipped, so you can interrupt and re-run safely.

Running the dashboard

.venv/bin/streamlit run app.py
# → opens http://localhost:8501

Features:

Folium map with one marker per base, clustered when zoomed out (disableClusteringAtZoom=8) so co-located bases (e.g. four Egyptian SA-2 sites within 0.17°) are still individually selectable.
Run selector — browse historical runs in data.json. Locked to the latest run while a run is in progress.
Detail panel — screenshots, all 8 analyst reports, commander summary, threat level.
Ask AI tab — free-form Q&A about a selected base (Gemini, with imagery + reports injected as context).
Real-time polling — 2-second st.rerun() loop while the analyser subprocess is alive.

You can also launch the analyser directly from the dashboard's "Run analysis" button.

Configuration

Top of base_analyzer.py:

ROWS_TO_PROCESS     = 20                   # 1 for testing, full CSV otherwise
NUM_ANALYSTS        = 8
PROVIDER            = "gemini"             # "gemini" | "openai" | "qwen"
ANALYST_MODEL       = "gemini-3.1-pro-preview"
COMMANDER_PROVIDER  = "openai"
COMMANDER_MODEL     = "gpt-5.5"
STRATEGIST_PROVIDER = "openai"
STRATEGIST_MODEL    = "gpt-5.5"

DEFAULT_ZOOM        = 17
ZOOM_DELTA          = 1
LAT_LNG_DELTA       = 0.01

Adjust ROWS_TO_PROCESS to control how many CSV rows are processed in a single run.

Output schema (`data.json`)

{
  "runs": [
    {
      "run_id": "2026-05-09T13:10:42",
      "started_at": "...",
      "completed_at": "...",
      "strategist": {
        "cross_base_patterns": ["..."],
        "priority_bases": [147, 1059],
        "overall_assessment": "..."
      },
      "bases": [
        {
          "id": 147,
          "country": "Egypt",
          "latitude": 23.954,
          "longitude": 32.995,
          "status": "complete",        // "pending" | "in_progress" | "complete"
          "moondream_caption": "...",
          "screenshots": ["screenshots/147/static_z17.jpg", "screenshots/147/sentinel.jpg"],
          "analysts": [ /* 8 × AnalystReport */ ],
          "commander": { /* CommanderReport */ }
        }
      ]
    }
  ]
}

Runs are appended, never overwritten. The dashboard reads runs[-1] by default.

Pydantic models (`models.py`)

class AnalystReport(BaseModel):
    findings: list[str]
    finding_confidences: list[int]              # same length as findings
    confidence: int                             # overall 1–10
    analysis: str
    things_to_continue_analyzing: list[str]
    action: Literal["zoom-in", "zoom-out", "move-left", "move-right", "finish"]
    status: Literal["done", "failed"] = "done"
    error: str | None = None

class CommanderReport(BaseModel):
    summary: str
    threat_assessment: Literal["HIGH", "MEDIUM", "LOW", "UNKNOWN"]
    high_consensus_findings: list[str]          # 3+ analysts agree, avg conf ≥ 7
    contested_findings: list[str]
    recommended_next_actions: list[str]

class StrategistReport(BaseModel):
    cross_base_patterns: list[str]
    priority_bases: list[int]
    overall_assessment: str

Tests

.venv/bin/python -m pytest tests/ -v
# 13/13 passing

tests/test_models.py — Pydantic model validation (4 tests)
tests/test_storage.py — atomic JSON read/write, run versioning (6 tests)
tests/test_consensus.py — commander consensus scoring (3 tests)

Gotchas

A few sharp edges worth knowing about before you change anything.

LLM clients

instructor is broken in this environment — it fails to import due to a mistralai version conflict. Use the native structured-output APIs instead (google.genai response_schema and openai beta.chat.completions.parse).
Gemini: import from google.genai, not the deprecated google.generativeai.
gemini-3.1-flash-image-preview is an image generation model. For vision analysis use gemini-3.1-pro-preview.
OpenAI: only use gpt-5.x models for the commander/strategist — never gpt-4.x.

Sentinel Hub (Copernicus Data Space)

Use SENTINELHUB_CLIENT_ID2 and SENTINELHUB_CLIENT_SECRET2 — the originals (CLIENT_ID / CLIENT_SECRET) are invalid against CDSE.
All three SHConfig URL fields must be set: sh_auth_base_url, sh_token_url, sh_base_url.

Setting config.sh_base_url alone does not override the request URL. You must also redefine the data collection:

cdse_s2 = DataCollection.SENTINEL2_L2A.define_from(
    "SENTINEL2_L2A_CDSE",
    service_url="https://sh.dataspace.copernicus.eu",
)

Streamlit dashboard

DATA_PATH must be absolute: Path(__file__).parent / "data.json".

The "Run analysis" button must launch the analyser with the absolute Python path and an explicit cwd:

subprocess.Popen(
    [str(PROJECT_ROOT / ".venv/bin/python"), "base_analyzer.py"],
    cwd=str(PROJECT_ROOT),
)

The Folium map uses MarkerCluster(disableClusteringAtZoom=8) so the four Egyptian SA-2 sites (within ~0.17°) don't visually merge.

`data.json` versioning

Runs are appended. Don't replace runs[-1] in place — start_run() creates a new entry with the CSV-derived latitude/longitude fields, and the per-base skip logic relies on status within that newest run.

Name		Name	Last commit message	Last commit date
Latest commit History 72 Commits
tests		tests
.env.example		.env.example
.gitignore		.gitignore
Kml_Military_bases.kml		Kml_Military_bases.kml
LICENSE		LICENSE
README.md		README.md
aerial_view_hits.json		aerial_view_hits.json
app.py		app.py
base_analyzer.py		base_analyzer.py
diagnose_aerial.py		diagnose_aerial.py
imagery.py		imagery.py
llm_client.py		llm_client.py
migrate_completed_bases.py		migrate_completed_bases.py
military_bases.csv		military_bases.csv
models.py		models.py
pipeline.py		pipeline.py
prefetch_ge_screenshots.py		prefetch_ge_screenshots.py
requirements.txt		requirements.txt
scan_aerial_view.py		scan_aerial_view.py
storage.py		storage.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OSINT Military Bases Analyzer

Table of Contents

What it does

Architecture

Tech stack

Project layout

Setup

Required environment variables

Running the pipeline

Running the dashboard

Configuration

Output schema (`data.json`)

Pydantic models (`models.py`)

Tests

Gotchas

LLM clients

Sentinel Hub (Copernicus Data Space)

Streamlit dashboard

`data.json` versioning

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

OSINT Military Bases Analyzer

Table of Contents

What it does

Architecture

Tech stack

Project layout

Setup

Required environment variables

Running the pipeline

Running the dashboard

Configuration

Output schema (data.json)

Pydantic models (models.py)

Tests

Gotchas

LLM clients

Sentinel Hub (Copernicus Data Space)

Streamlit dashboard

data.json versioning

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages

Output schema (`data.json`)

Pydantic models (`models.py`)

`data.json` versioning