Real-time human pose detection system using ESP32-P4-EYE with Telegram integration for fall notifications.
This project implements an embedded computer vision system that:
- Detects human poses in real time using YOLO11n-Pose
- Identifies potential falls through keypoint analysis
- Sends alerts and photos automatically via Telegram
- Uses a dual-chip architecture (ESP32-P4 + ESP32-C6) for distributed processing
- Main Chip: ESP32-P4-EYE (video processing and inference)
- Co-processor: ESP32-C6 (WiFi and Telegram communication)
- Camera: OV2710 (custom configuration)
- Display: Built-in LCD for real-time visualization
- Communication: UART between P4 and C6
- Model: YOLO11n-Pose V2 (QAT - Quantization-Aware Training)
- Inference resolution: 640x640 pixels (~3s on ESP32-P4)
- Accuracy: mAP50-95 = 0.449 (+4.2% vs V1)
- Detection of 17 COCO skeleton keypoints
- Visual overlay with detected pose on LCD
- Automatic body orientation analysis
- Alert LED (GPIO 23)
- Configurable cooldown system to prevent spam
- Alert message delivery
- JPEG photo delivery with detected pose
- Configuration via
menuconfig - Local HTTP server for receiving remote commands
- Continuous capture while inference runs
- Buffer protection during photo transmission
- Synchronization via mutex for thread-safety
.
├── main/ # Main application (ESP32-P4)
│ ├── app_main.c # Entry point and main loop
│ ├── app_video.c # Camera and display control
│ ├── coco_pose.cpp # YOLO11 model wrapper
│ ├── pose_overlay.cpp # Keypoint drawing on LCD
│ ├── fall_notifier.c # Fall detection logic
│ ├── net_telegram.c # Telegram API
│ ├── telegram_photo.c # JPEG encoding for Telegram
│ ├── coproc_uart.c # UART communication with C6
│ └── c6_flash_bridge.c # Remote C6 flashing via P4
│
├── c6_messenger/ # Co-processor firmware (ESP32-C6)
│ └── main/
│ ├── main.c # WiFi, HTTP client, UART
│ └── hosted_alert_server.c # HTTP server for commands
│
├── components/ # Custom components
├── p4_sdio_flash/ # SDIO flash utility
├── resources/ # Required binaries for C6 flashing
└── docs/ # Technical documentation
- ✅ Upgrade to YOLO11n-Pose V2: +4.2% accuracy (mAP 0.449 vs 0.431)
- ⚡ Optimized resolution: 640x640 (2x faster than 960x960)
- 🎯 Adjusted thresholds: Optimized for V2 model with QAT
- 📊 Inference latency: Reduced from ~6s to ~3s
- ESP-IDF v5.x or higher
- Toolchain for ESP32-P4 and ESP32-C6
idf.py set-target esp32p4
idf.py menuconfig # Configure WiFi, Telegram, etc.
idf.py build
idf.py flash monitorOr use the convenience script:
./rebuild_and_flash_p4.shcd c6_messenger
idf.py set-target esp32c6
idf.py menuconfig
idf.py buildOr use the script:
./rebuild_and_flash_c6.sh- Create a bot via @BotFather
- Get the bot token
- Get your chat_id via @userinfobot
- Configure via
idf.py menuconfig:Component config → Telegram → Enable Telegram- Enter
Bot TokenandChat ID - Adjust message cooldown (default: 60s)
Configure credentials via menuconfig:
Component config → Wi-Fi Configuration
The system uses a custom UART protocol:
- P4 → C6: JSON commands (message/photo delivery)
- C6 → P4: Responses and notifications
- Baud rate: Configurable (default: 115200)
{"cmd":"telegram","msg":"Fall detected!","photo":"<base64_jpeg>"}- Frame capture from camera (OV2710)
- Resize via PPA to 960x960
- YOLO11 inference on alternating buffer
- Keypoint extraction and pose analysis
- Overlay drawing on LCD
- Fall detection → Notification trigger
- Connects to WiFi
- Syncs time via SNTP
- Listens for UART commands from P4
- Sends HTTP requests to Telegram API
- Serves local HTTP for remote commands
The P4 can flash the C6 remotely via UART using c6_flash_bridge:
# On P4, enable bridge mode and use esptool via UART- Check cache sync in
telegram_photo.c:125 - Increase
CONFIG_SPIRAM_FETCH_INSTRUCTIONS
- Check GPIO pins (TX/RX)
- Confirm matching baud rate on P4 and C6
- See logs:
main/coproc_uart.c
- Current resolution: 640x640 (~3s on ESP32-P4)
- V2 model with QAT: mAP50-95 = 0.449
- For higher quality: use 960x960 (~6s, same accuracy)
- For real-time: try 320x320 (~600ms, -25% accuracy)
Pull requests are welcome! For major changes, please open an issue first to discuss what you would like to change.
- YOLO11n-Pose model: Ultralytics
- ESP-DL: Espressif Deep Learning Library
- ESP32-P4-EYE: Espressif Systems