scene-analysis-service

A lightweight FastAPI microservice for real-time multi-modal scene analysis. All inference runs via Triton Inference Server (ONNX, INT8-quantized) shared with the continuous-tracking system. No GPU runtime or PyTorch required in this container.

Provides object detection (YOLO26L), structured scene description (Florence-2-large), dense image embeddings (CLIP ViT-L/14), and YAML-configured hazard alerting via a single HTTP API.

Features

Component	Model	Backend	Notes
Object detection	YOLO26L	Triton (ONNX Runtime, INT8)	Up to 100 detections per frame, NMS-free
Scene description	Florence-2-large	Triton (Python backend, INT8)	Structured natural-language caption
Image embeddings	CLIP ViT-L/14	Triton (ONNX Runtime, INT8)	768-dim L2-normalised vector
Hazard alerting	Rule engine (YAML)	In-process	Label match + aspect-ratio + proximity checks

All four components are optional and independently togglable via config or per-request flags. The service starts and remains healthy even when Triton is unavailable (graceful degradation via Null* stubs).

Quick start

Prerequisites

Triton Inference Server running with the model repository from ../continuous-tracking/triton-models/. See that project's README for model export/download and Triton setup.
Python 3.13+

With Triton (production)

uv sync --extra triton
SAS_TRITON_URL=localhost:8701 uv run uvicorn app.main:app --host 0.0.0.0 --port 8300

Without Triton (API only, all Null stubs)

uv sync
uv run uvicorn app.main:app --host 0.0.0.0 --port 8300

The service starts and all endpoints return empty results with *_available: false.

Docker

docker build -t scene-analysis-service .
docker run -p 8300:8300 -e SAS_TRITON_URL=triton:8701 scene-analysis-service

The Docker image is ~200 MB — no PyTorch, no GPU drivers.

API endpoints

Method	Path	Description
`GET`	`/health`	Service health + component availability
`POST`	`/detect`	YOLO object detection only
`POST`	`/describe`	Florence-2 scene description only
`POST`	`/analyze`	Full pipeline (detect + describe + embed + hazards)

All POST endpoints accept multipart/form-data with an image field (JPEG, PNG, or any Pillow-supported format).

`/analyze` query parameters

Parameter	Type	Default	Description
`run_detect`	bool	`true`	Run YOLO detection
`run_describe`	bool	`true`	Run Florence-2 description
`run_embed`	bool	`true`	Run CLIP embedding
`run_hazards`	bool	`true`	Evaluate hazard rules

Example

curl -s -X POST http://localhost:8300/analyze \
  -F "image=@photo.jpg" \
  "?run_embed=false" | jq .

{
  "detections": [
    {"label": "person", "confidence": 0.92, "bbox": [10, 20, 200, 400], "class_id": 0}
  ],
  "description": "A person standing in a kitchen near a stove.",
  "embedding": [],
  "hazards": [],
  "detector_available": true,
  "describer_available": true,
  "embedder_available": true
}

Configuration

Configuration is read from config/config.yaml at startup. Every key can be overridden with an environment variable prefixed SAS_ (uppercased), e.g.:

SAS_TRITON_URL=triton:8701
SAS_YOLO_ENABLED=false
SAS_PORT=8200

Key settings

Key	Default	Description
`triton_url`	`""`	Triton gRPC endpoint (required for Triton backends)
`inference_backend`	`triton`	Detector: `triton` / `ultralytics` / `onnxruntime`
`yolo_model_name`	`person-detector`	Triton model name
`clip_backend`	`triton`	Embedder: `triton` / `openclip`
`clip_model_name`	`clip-vision`	Triton model name
`florence_backend`	`triton`	Describer: `triton` / `transformers`
`florence_model_name`	`florence-2`	Triton model name
`florence_tokenizer_dir`	`../continuous-tracking/triton-models/florence-2/1`	Tokenizer path
`device`	`auto`	PyTorch device (legacy backends only)
`yolo_confidence_threshold`	`0.25`	Detection confidence floor
`max_image_size_px`	`1920`	Longest edge limit; larger images downscaled
`port`	`8300`	Listening port
`log_level`	`info`	`debug` / `info` / `warning` / `error`

GPU vendor support

All GPU-specific logic lives in Triton configs, not in SAS. Both NVIDIA and Intel Arc GPUs are supported with identical SAS client code:

GPU	Triton setup
NVIDIA	Default `config.pbtxt` (TensorRT EP / CUDA EP)
Intel Arc	`python triton-models/scripts/configure_gpu.py --vendor intel`

SAS itself needs no GPU drivers, PyTorch, or vendor-specific libraries.

Legacy: in-process inference (without Triton)

SAS retains legacy in-process backends for development and fallback. Install the full PyTorch stack:

uv sync --extra inference

Then set backends in config/config.yaml:

triton_url: ""                     # disable Triton
inference_backend: ultralytics     # YOLO via PyTorch
clip_backend: openclip             # CLIP via OpenCLIP
florence_backend: transformers     # Florence-2 via HF Transformers

GPU acceleration requires additional setup (CUDA wheels or Intel IPEX). Prefer Triton backends for production.

Project layout

scene-analysis-service/
├── app/
│   ├── config.py              # Settings (YAML + SAS_ env overrides)
│   ├── main.py                # FastAPI app factory + lifespan
│   ├── models/
│   │   └── schemas.py         # Pydantic request/response models
│   ├── routers/
│   │   ├── analyze.py         # POST /analyze  (full pipeline)
│   │   ├── detect.py          # POST /detect
│   │   ├── describe.py        # POST /describe
│   │   └── health.py          # GET  /health
│   └── services/
│       ├── analyzer.py        # SceneAnalyzer orchestrator
│       ├── detector.py        # Detector ABC + build_detector()
│       ├── triton_detector.py # TritonDetector (YOLO via gRPC)
│       ├── describer.py       # SceneDescriber ABC + build_describer()
│       ├── triton_describer.py# TritonFlorenceDescriber (Florence-2 via gRPC)
│       ├── embedder.py        # ImageEmbedder ABC + build_embedder()
│       ├── triton_embedder.py # TritonClipEmbedder (CLIP via gRPC)
│       ├── device.py          # PyTorch device resolution (legacy backends)
│       └── hazards.py         # HazardRuleEngine (pure, YAML-driven)
├── config/
│   ├── config.yaml            # Default service config
│   └── hazards.yaml           # Default hazard rules
├── tests/
│   ├── conftest.py            # null_analyzer + TestClient fixtures
│   ├── test_analyzer.py       # SceneAnalyzer unit tests
│   ├── test_api.py            # HTTP endpoint tests
│   └── test_hazard_engine.py  # HazardRuleEngine unit tests
├── Dockerfile
└── pyproject.toml

Development

Install dev dependencies

uv sync --group dev

Run tests

uv run pytest
# with coverage
uv run pytest --cov=app --cov-report=term-missing

Tests do not require Triton or inference dependencies. All model components are replaced by Null* stubs.

Lint and format

uv run ruff check app/ tests/
uv run ruff format app/ tests/

Integration with cognitive-companion

The cognitive-companion backend communicates with this service via SceneAnalysisClient (backend/integrations/scene_analysis_client.py). Enable it by setting scene_analysis.enabled: true and scene_analysis.base_url: http://scene-analysis:8300 in the backend's settings.yaml.

Dependencies

Dependency	Purpose	Required?
`triton-shared`	Shared Triton client + pre/post processing	Yes (base)
`tritonclient[all]`	Triton gRPC client	Yes (production)
`tokenizers`	Florence-2 task prompt tokenization	Yes (base)
`fastapi` + `uvicorn`	HTTP API server	Yes (base)
`pillow` + `numpy`	Image handling	Yes (base)
`torch` + `transformers` + `open_clip_torch`	Legacy in-process backends	Optional (`inference` extra)

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
app		app
config		config
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

scene-analysis-service

Features

Quick start

Prerequisites

With Triton (production)

Without Triton (API only, all Null stubs)

Docker

API endpoints

`/analyze` query parameters

Example

Configuration

Key settings

GPU vendor support

Legacy: in-process inference (without Triton)

Project layout

Development

Install dev dependencies

Run tests

Lint and format

Integration with cognitive-companion

Dependencies

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

scene-analysis-service

Features

Quick start

Prerequisites

With Triton (production)

Without Triton (API only, all Null stubs)

Docker

API endpoints

/analyze query parameters

Example

Configuration

Key settings

GPU vendor support

Legacy: in-process inference (without Triton)

Project layout

Development

Install dev dependencies

Run tests

Lint and format

Integration with cognitive-companion

Dependencies

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

`/analyze` query parameters

Packages