Event-driven pet-feeding monitor that fuses multimodal LLM vision with a confidence-gated state machine to confirm meals in real time — and tells you about it on Telegram.
A naive "is the cat eating?" classifier fires constantly on a single ambiguous frame. FeedSentinel doesn't. It treats feeding as a temporal event, not a snapshot — requiring multiple consecutive, high-confidence observations before it confirms a meal. The result is a monitoring system that is quiet when nothing matters and reliable when it does, at a near-zero inference cost of roughly $0.10–$0.50 per day.
It is camera-agnostic by design: the application never touches the camera. Any external tool (ffmpeg, cron, a motion sensor) drops a frame into a watched directory, and the pipeline reacts. This decoupling makes it trivial to swap an RTSP CCTV feed for a webcam, a Raspberry Pi cam, or a folder of test images.
- Multimodal vision analysis — every frame is interpreted by GPT-4o mini against a strict JSON contract, returning activity, confidence, and human-readable reasoning for full auditability.
- Confidence-gated state machine — meals are confirmed only after N consecutive high-confidence "eating" frames, eliminating single-frame false positives.
- Cooldown control — a configurable quiet period prevents one meal from generating a storm of alerts.
- Cost-optimized model orchestration — an expensive vision model for perception, a cheap text model for the friendly notification copy.
- Durable decoupled queue — file watcher and processing pipeline communicate through a SQLite-backed work queue, so frames survive restarts and bursty writes.
- Resilient by default — Telegram or LLM failures are logged, never fatal; the daemon keeps running.
- Full observability — every analyzed frame and every confirmed meal is persisted to SQLite for later analysis.
[ffmpeg / cron] ┌────────────────────── monitoring.db ──────────────────────┐
│ │ image_queue · frame_logs · meal_events │
▼ └────────────────────────────────────────────────────────────┘
snapshots/ ──► file_watcher.py ──► [image_queue] ──► main.py (daemon loop)
│
┌──────────────────────────────────────┼───────────────┐
▼ ▼ ▼ ▼
llm_vision.py state_machine.py logger.py notifier.py
(GPT-4o mini) (consecutive-frame (SQLite frame (text LLM +
counter + cooldown) & meal log) Telegram)
Design principle: the app is purely reactive. Capture cadence, source, and hardware are external concerns — the pipeline only ever sees new files appearing in snapshots/.
- An external job (e.g. a 2-minute cron) writes a uniquely-named snapshot into
snapshots/. file_watcher.py(watchdog) debounces partial writes and enqueues the path into theimage_queuetable.- The
main.pydaemon dequeues paths, skipping work entirely while in cooldown. llm_vision.pybase64-encodes the frame and asks GPT-4o mini for a strict-JSON verdict →VisionResult.- Every frame is written to
frame_logs. state_machine.pyincrements its counter on high-confidence "eating" frames and resets on anything else.- On the N-th consecutive confirmation it fires a meal event:
notifier.pygenerates a warm one-line message with a cheap text model and pushes it plus the confirming image to Telegram; the event is recorded inmeal_events; the counter resets and a cooldown begins.
# 1. Clone & install
git clone https://github.com/<you>/feedsentinel.git
cd feedsentinel
pip install -r requirements.txt
# 2. Configure
cp .env.example .env # then fill in your keys
# 3. Prepare runtime directories
mkdir -p snapshots data
# 4. Run
python main.pyRun as a background daemon:
nohup python main.py >> app.log 2>&1 &FeedSentinel does not capture images itself. Point any tool at snapshots/. Example: one RTSP snapshot every 2 minutes via cron (note the escaped % and unique filename — required so the watcher sees a new file each time):
*/2 * * * * /usr/bin/ffmpeg -rtsp_transport tcp -i "rtsp://user:pass@CAMERA_IP:554/stream" \
-frames:v 1 -y -loglevel error \
/abs/path/snapshots/snap_$(date +\%Y\%m\%d_\%H\%M\%S).jpg >> /abs/path/ffmpeg_cron.log 2>&1All configuration is environment-driven (.env, loaded via python-dotenv). The app fails loudly at startup if any required key is missing.
| Variable | Required | Default | Description |
|---|---|---|---|
OPENAI_API_KEY |
✅ | — | OpenAI API key for vision + messaging |
TELEGRAM_BOT_TOKEN |
✅ | — | Telegram Bot API token |
TELEGRAM_CHAT_ID |
✅ | — | Destination chat for alerts |
TELEGRAM_API_URL |
https://api.telegram.org |
Override for proxies/self-host | |
SNAPSHOTS_DIR |
./snapshots |
Watched directory | |
DB_PATH |
./data/monitoring.db |
SQLite database path | |
CAT_NAME |
Cat |
Used to personalize notifications | |
CONSECUTIVE_FRAMES_REQUIRED |
3 |
N — confirmations needed per meal | |
MEAL_COOLDOWN_MINUTES |
30 |
Minimum gap between alerts | |
LLM_MODEL |
gpt-4o-mini |
Vision model | |
MESSAGING_MODEL |
gpt-3.5-turbo |
Notification-copy model |
Python 3.10+ · OpenAI (GPT-4o mini vision + text) · watchdog (filesystem events) · SQLite (durable queue + analytics) · Telegram Bot API via requests. No heavy ML frameworks, no OpenCV — pure Python.
python -m pytest tests/test_state_machine.py— counter, cooldown, and edge-case logic (no API calls).test_llm_vision.py— runs the vision module against static sample images.
| Snapshot cadence | Daily API calls | Approx. daily cost |
|---|---|---|
| Every 1 min (8 h) | ~480 | ~$0.50 |
| Every 2 min | ~240 | ~$0.25 |
| Every 5 min | ~96 | ~$0.10 |
Recommended: every 2–3 minutes — the sweet spot between responsiveness and spend.
- Two-way control: REST endpoint to trigger analysis on demand
- Remote runtime config (adjust N, cooldown) without restart
- Daily Telegram digest of feeding history
- Web dashboard over the frame & meal logs
- Motion-triggered capture to cut API cost further
- Missed-meal alerting
Single-cat scenarios only · no portion/consumption estimation · daytime-optimized (low light degrades accuracy) · capture cadence is an external concern.
MIT — see LICENSE.