Real-time eye tracking pipeline built with Python, OpenCV, and MediaPipe FaceMesh. Tracks iris position, estimates head pose in 3D, detects fixations and blinks, maps gaze to Areas of Interest, and exports per-frame metrics to CSV for offline analysis.
| Feature | Details |
|---|---|
| Iris gaze estimation | Tracks iris position within eye bounds using MediaPipe's 478-point mesh (landmarks 468–477). Outputs normalized horizontal/vertical gaze ratios. |
| 3D head pose | cv2.solvePnP on 6 facial landmarks → roll, pitch, yaw in degrees. Axes drawn on nose tip in real time. |
| Blink detection | Eye Aspect Ratio (EAR) formula on both eyes; blink flagged when avg EAR < 0.20. |
| Gaze direction estimation | Fuses iris ratios with head-pose yaw/pitch to produce a head-independent gaze direction vector (dir_h, dir_v) in [-1, 1]. Visualised as a live miniature indicator overlay. |
| 3D point cloud export | --export-ply writes two PLY files on exit: (1) sampled face mesh landmarks coloured by time, (2) 3D gaze ray endpoints coloured by horizontal position. Viewable in MeshLab, CloudCompare, or Open3D. |
| Fixation detection | Velocity-based classifier: gaze velocity < 25 px/s for ≥ 100 ms = fixation. Completed fixations logged with duration and position. |
| AOI tracking | Configurable rectangular Areas of Interest with per-AOI dwell time accumulation. |
| Heatmap overlay | Gaussian-blurred JET colormap overlaid on the live frame. |
| Gaze attention classifier | sklearn Random Forest trained on 5 gaze features → predicts on_screen / peripheral / away with >95% CV accuracy. Demonstrated in notebooks/classifier.ipynb. |
| CSV export | Per-frame record saved to data/gaze_<timestamp>.csv on exit. |
conda (recommended):
git clone https://github.com/apayne185/cv2-eye-tracking-system.git
cd cv2-eye-tracking-system
conda env create -f environment.yml
conda activate eyetrackpip / venv:
pip install -r requirements.txtRequirements: Python 3.11+, opencv-python, numpy, mediapipe, pandas, scipy
Windows note: tested on Windows 11 with Python 3.12. The conda env is recommended — a pytest.ini is included that suppresses a known conflict between the dash pytest plugin and mediapipe's DLL initialisation on Windows.
# Default: webcam 0
python src/main.py
# Specific webcam index
python src/main.py --source 1
# Process a recorded video file
python src/main.py --source path/to/video.mp4
# Custom output directory
python src/main.py --source 0 --output-dir results/
# Export PLY point clouds (face mesh + gaze trajectory)
python src/main.py --source 0 --export-plyPress q to quit — the session CSV and heatmap are saved automatically.
--- Session Summary ---
Frames recorded: 2500
Blinks detected: 124
Fixation frames: 438 (17.5%)
Fixations: 39 avg=0.20s max=0.65s
AOI dwell (frames):
Center: 2158 (86.3%)
Left: 120 (4.8%)
AOI dwell (seconds):
Center: 78.77s
Left: 4.14s
| Column | Description |
|---|---|
frame |
Frame index |
timestamp |
Unix timestamp |
gaze_x, gaze_y |
Iris center in pixel coordinates |
gaze_ratio_h, gaze_ratio_v |
Normalized gaze position within eye (0–1) |
pitch, yaw, roll |
Head Euler angles in degrees |
left_ear, right_ear |
Eye Aspect Ratio per eye |
is_blink |
Boolean |
is_fixation |
Boolean |
dir_h, dir_v |
Estimated gaze direction in [-1, 1] (iris + head pose fused) |
ray_ox, ray_oy, ray_oz |
3D gaze ray origin (eye midpoint in camera coords, mm) |
ray_dx, ray_dy, ray_dz |
3D gaze ray unit direction vector in camera coords |
active_aoi |
Name of active Area of Interest, or null |
cv2-eye-tracking-system/
├── src/
│ ├── main.py # Entry point — argparse, main loop, CSV export
│ ├── eye_tracker.py # EyeTracker class: iris gaze, EAR blink, fixation
│ ├── head_pose.py # HeadPoseEstimator: solvePnP, draw_axes, gaze ray
│ ├── direction.py # GazeDirectionEstimator: 2D direction + 3D gaze ray
│ ├── face_mesh_3d.py # PLY point cloud export: face mesh + gaze trajectory
│ ├── gaze_analysis.py # Heatmap accumulator and renderer
│ ├── AOI.py # AOITracker class with dwell-time accumulation
│ └── old_work/ # Legacy scripts (reference only)
├── notebooks/
│ ├── analysis.ipynb # Offline session analysis — plots, heatmap, stats
│ └── classifier.ipynb # ML training pipeline — RF vs SVM vs MLP, CV, confusion matrix
├── tests/
│ ├── test_direction.py # Direction estimator unit tests
│ ├── test_fixation.py # Fixation state machine unit tests
│ ├── test_gaze_analysis.py # Heatmap accumulator unit tests
│ ├── test_face_mesh_3d.py # PLY export unit tests
│ └── test_gaze_classifier.py # Classifier training, inference, persistence
│ ├── conftest.py # sys.path setup for src/ imports
│ ├── test_direction.py # Direction estimator unit tests
│ ├── test_fixation.py # Fixation state machine unit tests
│ ├── test_face_mesh_3d.py # PLY export unit tests
│ └── test_gaze_analysis.py # Heatmap accumulator unit tests
├── data/ # Session output (CSV, heatmap, summary) — gitignored
├── eye_gaze_heatmap.jpg # Sample heatmap output
├── pytest.ini # Disables dash plugin (Windows mediapipe compatibility)
├── environment.yml # conda env (Python 3.11, eyetrack)
├── requirements.txt
└── README.md
Why iris landmarks over eye center averaging?
The earlier approach averaged the positions of all eye outline landmarks, which tracks face movement but not gaze direction. The iris landmarks (MediaPipe 468–477, enabled via refine_landmarks=True) give the actual pupil/iris position, so moving your eyes while keeping your head still produces a meaningful signal.
Head pose as gaze context
solvePnP maps six 2D facial landmarks to a known 3D face model to recover the rotation matrix. Roll/pitch/yaw complement the iris ratios: a centered iris with a 30° yaw still points off-center in world space. These extrinsics feed directly into the 3D gaze ray computation.
Fixation vs. saccade
The velocity threshold (25 px/s) follows the I-VT (Identification by Velocity Threshold) algorithm common in psychophysics research. Saccades typically exceed 300 px/s; the threshold is conservative to reduce noise from head micro-movements.
Gaze direction fusion
Iris ratios alone are relative to the eye socket — they correctly detect eye movement but are blind to head rotation. solvePnP yaw and pitch capture head orientation but ignore where the eyes point within the socket. GazeDirectionEstimator linearly combines both signals: dir_h = iris_deviation * EYE_SCALE + yaw * HEAD_SCALE. The weights are empirically tuned; a calibration step (mapping known gaze targets to measured ratios) would improve absolute accuracy.
PLY point clouds and the gaze trajectory
The face mesh export writes MediaPipe's 478 per-landmark 3D coordinates (x, y in pixel space; z at the same relative scale) as a binary PLY file — the format used by depth cameras, LiDAR scanners, and 3D reconstruction pipelines. The gaze trajectory cloud projects each session's 3D gaze rays onto a virtual plane at 500 mm depth, producing a spatial map of where the subject's attention landed. Both files can be opened directly in MeshLab, CloudCompare, or Open3D for inspection.
Open the notebook to analyse any recorded session CSV:
conda activate eyetrack
jupyter lab notebooks/analysis.ipynbThe notebook auto-loads the most recent data/gaze_*.csv. Set CSV_PATH manually to analyse a specific session. Produces: gaze scatter, EAR/blink plot, fixation timeline, head yaw, AOI dwell, and gaze heatmap.
conda activate eyetrack
jupyter lab notebooks/classifier.ipynbTrains a Random Forest on synthetic gaze data (1800 samples, 3 classes) and demonstrates:
- Feature distribution visualisation
- Train/test split with classification report
- 5-fold cross-validation vs SVM and MLP
- Confusion matrix and feature importances
- Applying the model to a real session CSV (Section 7)
conda activate eyetrack
pip install pytest
pytest tests/ -v29 tests across direction estimation, face mesh export, fixation detection, and heatmap accumulation.