Skip to content

Gallind/OSINT-Military-Bases-Analyzer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

72 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

OSINT Military Bases Analyzer

A multi-agent OSINT pipeline that fetches satellite imagery of military bases, runs 8 vision-LLM analysts per base, synthesises their findings with a commander agent, generates cross-base patterns with a strategist agent, and visualises everything in a real-time Streamlit + Folium dashboard.

Built for From Idea to App (Reichman University, Semester 4, Assignment 2).


Table of Contents


What it does

Given a CSV of military bases (id, country, name, latitude, longitude), the pipeline:

  1. Fetches imagery for each base from Google Maps Static API (primary) and Sentinel Hub (Sentinel-2 true-colour, deep view).
  2. Runs 8 analyst agents in a LangGraph loop. Each analyst inspects the current frame, extracts findings with per-finding confidence, and chooses an action (zoom-in, zoom-out, move-left, move-right, or finish). The next frame is re-fetched accordingly.
  3. Calls a commander agent that consolidates the 8 analyst reports into a threat_assessment (HIGH / MEDIUM / LOW / UNKNOWN), high-consensus findings (3+ analysts agree, avg confidence β‰₯ 7), contested findings, and recommended next actions.
  4. Calls a strategist agent at the end of the run to surface cross-base patterns, priority bases, and an overall assessment.
  5. Persists everything to data.json (atomic writes, append-only run history) and renders it in the dashboard.

The dashboard launches the analyser as a subprocess, polls data.json every 2 s while a run is in progress, and includes an "Ask AI" tab that answers free-form questions about a selected base with the imagery + reports as context.


Architecture

                              military_bases.csv
                                       β”‚
                                       β–Ό
                   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                   β”‚           base_analyzer.py            β”‚
                   β”‚  (orchestrator: per-base loop, run    β”‚
                   β”‚   versioning, commander, strategist)  β”‚
                   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                       β”‚
                                       β–Ό
                   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                   β”‚           pipeline.py                 β”‚
                   β”‚   LangGraph StateGraph                β”‚
                   β”‚                                       β”‚
                   β”‚   fetch_image ──► run_analyst ──►     β”‚
                   β”‚        β–²                β”‚             β”‚
                   β”‚        β”‚                β–Ό             β”‚
                   β”‚        └──── decide_next ──► END      β”‚
                   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                            β”‚              β”‚
                β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜              └──────────────┐
                β–Ό                                         β–Ό
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β”‚  imagery.py  β”‚                          β”‚ llm_client.pyβ”‚
        β”‚              β”‚                          β”‚              β”‚
        β”‚ Google Maps  β”‚                          β”‚ gemini /     β”‚
        β”‚ Sentinel Hub β”‚                          β”‚ openai /     β”‚
        β”‚ Moondream    β”‚                          β”‚ qwen factory β”‚
        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                          β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                       β”‚
                                       β–Ό
                              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                              β”‚   storage.py   β”‚
                              β”‚ atomic JSON    β”‚
                              β”‚ run versioning β”‚
                              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                       β”‚
                                       β–Ό
                                  data.json
                                       β”‚
                                       β–Ό
                              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                              β”‚     app.py     β”‚
                              β”‚ Streamlit +    β”‚
                              β”‚ Folium map +   β”‚
                              β”‚ Ask AI tab     β”‚
                              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Each analyst write is streamed via graph.stream() (not .invoke()) so the dashboard sees per-analyst progress in real time.


Tech stack

Layer Library / service Notes
Imagery β€” primary Google Maps Static API REST β†’ JPEG; needs GOOGLE_MAPS_KEY
Imagery β€” deep view sentinelhub-py 3.11.5 Sentinel-2 L2A via Copernicus Data Space
Imagery β€” caption pass Moondream 3 cloud (moondream) Optional fast pre-pass; needs MOONDREAM_API_KEY
Analyst LLM google.genai (Gemini) gemini-3.1-pro-preview with response_schema for structured output
Commander / Strategist openai SDK gpt-5.5 via beta.chat.completions.parse
Free vision alt OpenRouter (qwen3-vl-32b) Drop-in via PROVIDER="qwen"
Multi-agent loop langgraph 1.1.10 StateGraph; streamed
Dashboard streamlit 1.57 + streamlit-folium Folium MarkerCluster for overlapping bases
Validation pydantic 2.x Native structured outputs (no instructor β€” see Gotchas)

Project layout

.
β”œβ”€β”€ base_analyzer.py     # main orchestrator β€” entry point
β”œβ”€β”€ pipeline.py          # LangGraph nodes + commander/strategist callers
β”œβ”€β”€ llm_client.py        # provider factory: gemini | openai | qwen
β”œβ”€β”€ imagery.py           # Google Maps, Sentinel Hub, Moondream fetchers
β”œβ”€β”€ storage.py           # atomic data.json reads/writes + run versioning
β”œβ”€β”€ models.py            # AnalystReport, CommanderReport, StrategistReport
β”œβ”€β”€ app.py               # Streamlit dashboard
β”œβ”€β”€ tests/               # 13 unit tests (models, storage, consensus)
β”œβ”€β”€ military_bases.csv   # input dataset
β”œβ”€β”€ data.json            # output (gitignored, append-only run history)
β”œβ”€β”€ screenshots/         # JPEGs per base (gitignored)
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ .env.example         # template β€” copy to .env and fill in keys
└── CLAUDE.md            # project instructions for Claude Code

Setup

# 1. Clone and enter the project
cd OSINT-Military-Bases-Analyzer

# 2. Create a virtualenv (Python 3.12+ recommended)
python3 -m venv .venv
source .venv/bin/activate

# 3. Install dependencies
pip install -r requirements.txt

# 4. Configure secrets
cp .env.example .env
# Then edit .env and fill in your API keys (see below)

Required environment variables

GOOGLE_MAPS_KEY=             # required β€” primary imagery
GEMINI_API_KEY=              # required β€” analyst + commander vision LLM
OPENAI_API_KEY=              # required β€” commander + strategist (gpt-5.5)
OPENROUTER_API_KEY=          # optional β€” only if PROVIDER="qwen"
MOONDREAM_API_KEY=           # optional β€” caption pre-pass
SENTINELHUB_CLIENT_ID2=      # optional β€” deep Sentinel-2 imagery
SENTINELHUB_CLIENT_SECRET2=  # optional β€” paired with the above

Running the pipeline

# Full run (uses ROWS_TO_PROCESS in base_analyzer.py)
.venv/bin/python base_analyzer.py

# Fresh run β€” clear previous output first
rm data.json && .venv/bin/python base_analyzer.py

The pipeline is idempotent within a run: bases marked status="complete" in the latest run are skipped, so you can interrupt and re-run safely.


Running the dashboard

.venv/bin/streamlit run app.py
# β†’ opens http://localhost:8501

Features:

  • Folium map with one marker per base, clustered when zoomed out (disableClusteringAtZoom=8) so co-located bases (e.g. four Egyptian SA-2 sites within 0.17Β°) are still individually selectable.
  • Run selector β€” browse historical runs in data.json. Locked to the latest run while a run is in progress.
  • Detail panel β€” screenshots, all 8 analyst reports, commander summary, threat level.
  • Ask AI tab β€” free-form Q&A about a selected base (Gemini, with imagery + reports injected as context).
  • Real-time polling β€” 2-second st.rerun() loop while the analyser subprocess is alive.

You can also launch the analyser directly from the dashboard's "Run analysis" button.


Configuration

Top of base_analyzer.py:

ROWS_TO_PROCESS     = 20                   # 1 for testing, full CSV otherwise
NUM_ANALYSTS        = 8
PROVIDER            = "gemini"             # "gemini" | "openai" | "qwen"
ANALYST_MODEL       = "gemini-3.1-pro-preview"
COMMANDER_PROVIDER  = "openai"
COMMANDER_MODEL     = "gpt-5.5"
STRATEGIST_PROVIDER = "openai"
STRATEGIST_MODEL    = "gpt-5.5"

DEFAULT_ZOOM        = 17
ZOOM_DELTA          = 1
LAT_LNG_DELTA       = 0.01

Adjust ROWS_TO_PROCESS to control how many CSV rows are processed in a single run.


Output schema (data.json)

{
  "runs": [
    {
      "run_id": "2026-05-09T13:10:42",
      "started_at": "...",
      "completed_at": "...",
      "strategist": {
        "cross_base_patterns": ["..."],
        "priority_bases": [147, 1059],
        "overall_assessment": "..."
      },
      "bases": [
        {
          "id": 147,
          "country": "Egypt",
          "latitude": 23.954,
          "longitude": 32.995,
          "status": "complete",        // "pending" | "in_progress" | "complete"
          "moondream_caption": "...",
          "screenshots": ["screenshots/147/static_z17.jpg", "screenshots/147/sentinel.jpg"],
          "analysts": [ /* 8 Γ— AnalystReport */ ],
          "commander": { /* CommanderReport */ }
        }
      ]
    }
  ]
}

Runs are appended, never overwritten. The dashboard reads runs[-1] by default.

Pydantic models (models.py)

class AnalystReport(BaseModel):
    findings: list[str]
    finding_confidences: list[int]              # same length as findings
    confidence: int                             # overall 1–10
    analysis: str
    things_to_continue_analyzing: list[str]
    action: Literal["zoom-in", "zoom-out", "move-left", "move-right", "finish"]
    status: Literal["done", "failed"] = "done"
    error: str | None = None

class CommanderReport(BaseModel):
    summary: str
    threat_assessment: Literal["HIGH", "MEDIUM", "LOW", "UNKNOWN"]
    high_consensus_findings: list[str]          # 3+ analysts agree, avg conf β‰₯ 7
    contested_findings: list[str]
    recommended_next_actions: list[str]

class StrategistReport(BaseModel):
    cross_base_patterns: list[str]
    priority_bases: list[int]
    overall_assessment: str

Tests

.venv/bin/python -m pytest tests/ -v
# 13/13 passing
  • tests/test_models.py β€” Pydantic model validation (4 tests)
  • tests/test_storage.py β€” atomic JSON read/write, run versioning (6 tests)
  • tests/test_consensus.py β€” commander consensus scoring (3 tests)

Gotchas

A few sharp edges worth knowing about before you change anything.

LLM clients

  • instructor is broken in this environment β€” it fails to import due to a mistralai version conflict. Use the native structured-output APIs instead (google.genai response_schema and openai beta.chat.completions.parse).
  • Gemini: import from google.genai, not the deprecated google.generativeai.
  • gemini-3.1-flash-image-preview is an image generation model. For vision analysis use gemini-3.1-pro-preview.
  • OpenAI: only use gpt-5.x models for the commander/strategist β€” never gpt-4.x.

Sentinel Hub (Copernicus Data Space)

  • Use SENTINELHUB_CLIENT_ID2 and SENTINELHUB_CLIENT_SECRET2 β€” the originals (CLIENT_ID / CLIENT_SECRET) are invalid against CDSE.

  • All three SHConfig URL fields must be set: sh_auth_base_url, sh_token_url, sh_base_url.

  • Setting config.sh_base_url alone does not override the request URL. You must also redefine the data collection:

    cdse_s2 = DataCollection.SENTINEL2_L2A.define_from(
        "SENTINEL2_L2A_CDSE",
        service_url="https://sh.dataspace.copernicus.eu",
    )

Streamlit dashboard

  • DATA_PATH must be absolute: Path(__file__).parent / "data.json".

  • The "Run analysis" button must launch the analyser with the absolute Python path and an explicit cwd:

    subprocess.Popen(
        [str(PROJECT_ROOT / ".venv/bin/python"), "base_analyzer.py"],
        cwd=str(PROJECT_ROOT),
    )
  • The Folium map uses MarkerCluster(disableClusteringAtZoom=8) so the four Egyptian SA-2 sites (within ~0.17Β°) don't visually merge.

data.json versioning

Runs are appended. Don't replace runs[-1] in place β€” start_run() creates a new entry with the CSV-derived latitude/longitude fields, and the per-base skip logic relies on status within that newest run.

About

Multi-agent OSINT pipeline that runs 8 vision-LLM analysts, a commander, and a strategist over satellite imagery of military bases, then visualises findings on a real-time Streamlit + Folium dashboard.

Topics

Resources

License

Stars

Watchers

Forks

Contributors

Languages