Control your computer's mouse, volume, and more using an AI-powered hand tracking engine. Built with Python, OpenCV, and MediaPipe, GestureX provides a smooth, intuitive, and highly configurable desktop application to map complex hand gestures to native OS commands.
GestureX currently supports a wide array of advanced gestures out-of-the-box:
- Mouse Movement: Smoothly guide the cursor using your index and middle fingers.
- Left Click: "Pinch" your index and middle fingers together while moving.
- Scroll (Up & Down): Move your hand up or down vertically with thumb, index, and middle fingers extended. (Pinching pauses the scroll).
- Drag & Drop: Pinch your thumb and index fingers to grab, move your hand, and release to drop.
- Volume Control: Extend all five fingers and rotate your hand like a physical knob (left for down, right for up).
- Screenshot: Use a rapid sequence: Fist → Open Palm → Fist to capture the screen instantly. Includes native Windows toast notifications!
- Desktop UI Toolkit: A modern, dark-themed GUI (built via CustomTkinter) that allows you to configure thresholds, tweak sensitivities, and launch the engine.
| Feature | Gesture Requirement | Action |
|---|---|---|
| Move Cursor | Index & Middle fingers UP | Move hand within the virtual bounds to push the cursor. |
| Left Click | Index & Middle fingers UP | Bring Index and Middle fingers close together to trigger click. |
| Scroll | Thumb, Index, & Middle UP | Move hand up/down relative to the start position. Join fingers to Lock/Pause. |
| Drag & Drop | Thumb & Index fingers UP | Bring Thumb and Index close together to Start Drag. Release to Drop. |
| Volume Knob | All 5 fingers UP (Open Palm) | Rotate the hand clockwise (Up) or counter-clockwise (Down). |
| Screenshot | Multi-state Sequence | Fist → hold briefly → Open Palm → hold briefly → Fist. |
(Note: The engine uses sophisticated edge-triggering, debounce-timers, and sticky-modes to prevent gesture flickering and ensure accurate commands).
Ensure you are running exactly the supported dependencies. A requirements.txt is provided.
-
Clone the repository
git clone https://github.com/PhilemonTJ/Gesture-based-Computer-Interaction.git cd Gesture-based-Computer-Interaction -
Set up a Virtual Environment (Recommended)
python -m venv venv # On Windows: venv\Scripts\activate
-
Install Dependencies
pip install -r requirements.txt
(Required packages include
opencv-python,mediapipe,numpy,mouse,pyautogui,customtkinter,Pillow, andwin11toast).
GestureX features a settings manager and a centralized desktop app.
python src/app_ui.pyFrom the GUI you can:
- Adjust camera settings and interaction zones.
- Tweak gesture sensitivities (like scroll speed, join thresholds, angles).
- Click "Start Engine" to launch the gesture camera feed.
To just run the raw gesture engine directly without the GUI:
python src/main.pyGesture-based-Computer-Interaction/
├── src/
│ ├── app_ui.py # Desktop GUI launcher
│ ├── main.py # Core gesture detection loop
│ ├── controllers/ # Modular controller logic
│ │ ├── drag_drop_controller.py
│ │ ├── mouse_click_controller.py
│ │ ├── mouse_movement_controller.py
│ │ ├── screenshot_controller.py
│ │ ├── scroll_controller.py
│ │ └── volume_controller.py
│ ├── core/
│ │ ├── camera_manager.py # Webcam capture handler
│ │ ├── config.py # Global constants & variables
│ │ └── preferences.py # Settings save/load JSON manager
│ ├── ui/
│ │ ├── app.py # Main CustomTkinter UI definitions
│ │ ├── engine.py # Background thread runner for Main
│ │ └── settings.py # Configuration window UI
│ └── utils/
│ ├── hand_detector.py # MediaPipe wrapper & math helpers
│ ├── roi_tracker.py # Cropping optimizer for faster FPS
│ └── volume_manager.py # Windows native COM volume bindings
├── requirements.txt
└── README.md
- Hand Tracking: Built on Google's
MediaPipe, detecting 21 3D hand landmarks in real-time. - ROI Optimization: We use a custom Region of Interest (ROI) tracker. Once a hand is found, we crop the camera frame around it dynamically, boosting overall FPS and reducing CPU load substantially.
- One-Euro Filter: The mouse controller pushes the cursor coordinates through a 1€ Filter algorithm to eliminate jitter while preserving low latency during quick movements.
- State Machine: Complex features like Screenshot or persistent Drag-and-Drop are handled using isolated state machines inside their respective
Controllerclasses.