English | 日本語
A computer-vision tool that estimates a skier's posture from video and quantitatively evaluates their form via joint angles.
SkiSense detects a skier in a clip, estimates their pose, and scores key joint angles (knee, hip, ankle, shoulder tilt) against ideal ranges drawn from competitive kihon (technical) skiing. It is a visualization and quantitative-feedback tool, not an automatic judge. The annotated video, the best-scoring frame, and per-joint readouts are written to disk.
- Person detection — YOLOv8x
- Pose estimation — selectable backend: SAM 3D Body (3D MHR-21, default) or YOLO11-Pose (2D COCO-17). See Pose backends
- Joint-angle evaluation — knee / hip / ankle in 3D, shoulder tilt in 2D
- Overall score — 0–100 over the angles that can be measured
- Multi-person tracking — Deep SORT keeps a stable ID across frames
- Auto zoom & centering — keeps the skier's torso centered
- Best-shot extraction — saves the highest-scoring frame
PyTorch must be installed first, matching your CUDA build:
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
pip install -r requirements.txt
# Only needed for the SAM 3D Body backend (see docs/pose_backends.md):
pip install -r requirements-sam3d.txtRun:
python run.py video.mp4 # process a video (input/video.mp4 by default)
python run.py --fast video.mp4 # skip per-frame detect/track; one pass per frame
python run.py skier.jpg --image # process a single imageOutput is written to output/YYYYMMDD_HHMMSS/ (video_pose.mp4,
best_shot.jpg, and a copy of the input).
The default
sam3dbackend uses gated HuggingFace weights: request access tofacebook/sam-3d-body-dinov3and runhf auth loginonce before the first run. To avoid this (or to run on CPU), use theyolo11backend instead.
The pose engine is selected in .env:
SKISENSE_POSE_BACKEND=sam3d # default: SAM 3D Body (3D, CUDA required)
SKISENSE_POSE_BACKEND=yolo11 # YOLO11-Pose (2D, runs on CPU/MPS/CUDA)- SAM 3D Body — 3D MHR keypoints + body mesh; view-invariant joint angles including the ankle. CUDA required; ~1–2 s/frame.
- YOLO11-Pose — 2D COCO-17 keypoints; fast and CPU-capable, but the
ankle angle is
N/A(no foot landmark).
Full comparison, settings, and trade-offs:
docs/pose_backends.md.
Each frame runs a three-step pipeline:
- Detection & tracking — YOLOv8x detects persons; Deep SORT assigns persistent track IDs.
- Pose estimation — the selected backend returns keypoints (3D + 2D
for SAM 3D Body, 2D for YOLO11-Pose);
pose_analyzerevaluates joint angles. - Rendering —
ZoomTrackerapplies smooth zoom, skeleton/bbox are drawn through the singletransform_point_to_zoom()choke point, and the info panel is overlaid.
Key modules: config.py, pose_topology.py, backends/,
pose_analyzer.py, zoom_tracker.py, main.py, image_processor.py.
- Project background, design rationale, scoring logic, and lessons
learned (Japanese): README_ja.md /
docs/project_details_ja.md
MIT License. See LICENSE.

