Skip to content

The pose prediction system is driven by a custom Graph Convolutional Network (GCN) model named PoseGCNFeat, which integrates both attention mechanisms and graph-based feature extraction...

Notifications You must be signed in to change notification settings

saalieri/gcn-y8-deepsort

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

GCN Architecture Initialization

The pose prediction system is driven by a custom Graph Convolutional Network (GCN) model named PoseGCNFeat, which integrates both attention mechanisms and graph-based feature extraction. The architecture is initialized as follows:

PoseGCNFeat(
  (conv1): GATConv(8, 64, heads=1)
  (conv2): GATConv(64, 64, heads=1)
  (attn1): FeatureAttention(
    (attn): Sequential(
      (0): Linear(in_features=64, out_features=64, bias=True)
      (1): Tanh()
      (2): Linear(in_features=64, out_features=1, bias=True)
      (3): Sigmoid()
    )
  )
  (conv3): GCNConv(64, 2)
  (dropout): Dropout(p=0.3, inplace=False)
)

The model is loaded and prepared for evaluation using the following code:

model = PoseGCNFeat()
model.load_state_dict(torch.load(BEST_GCN_MODEL_WEIGHTS))
model.eval()

Node Feature Description

This is a GCN-based model for keypoint completion. The network operates per joint (node) with the following structure:

Inputs per node: [x, y, dx, dy, class_id, joint_id, track_id, visibility] → 8 features

Outputs per node: [x, y] → 2D coordinates only

This architecture combines spatial attention with graph convolutions to enhance the prediction accuracy of partially observed or occluded keypoints during physical activities such as push-ups, sit-ups, and squats.


Push Up Pose Estimation Report

This report presents the summarized evaluation of pose estimation accuracy for the Push Up class using a pipeline that combines YOLOv8 pose estimation, DeepSORT tracking, and a Graph Convolutional Network (GCN) for keypoint prediction.

Evaluation Scope

  • Frames analyzed: 15 to 31
  • Views: Bottom, Left, Right, and Top
  • Model components:
    • YOLOv8: Keypoint detection
    • DeepSORT: Person tracking
    • GCN: Keypoint refinement

Visual Examples

Below are visual samples of the predictions and masked inputs for different frames and viewpoints. They illustrate the ability of the model to infer missing keypoints and predict accurately under occlusion.

  • Frame 15 (Bottom view):
    Frame 15 - Bottom

  • Frame 9 (Top view):
    Frame 9 - Top

  • Frame 10 (Left view):
    Frame 10 - Left

  • Frame 11 (Right view):
    Frame 11 - Right

Metrics Used

Metric Description
OKS (Object Keypoint Similarity) Measures similarity between predicted and ground truth keypoints. Ranges from 0 to 1.
MPJPE (Mean Per Joint Position Error) Average Euclidean distance error per joint, in pixels.
PCK@X (Percentage of Correct Keypoints) Fraction of joints correctly predicted within X-pixel radius.

Summary Statistics

View OKS (mean) MPJPE (mean) PCK@50 PCK@100 PCK@150
Bottom 1.000 9.278 px 1.000 1.000 1.000
Left 1.000 7.575 px 1.000 1.000 1.000
Right 1.000 7.696 px 1.000 1.000 1.000
Top 1.000 8.402 px 1.000 1.000 1.000

Conclusion

The pose estimation system delivers highly accurate and consistent keypoint predictions across all views for the Push Up class. All views achieved perfect OKS and PCK scores, and MPJPE values remained well within acceptable limits, demonstrating the robustness of the combined GCN + DeepSORT + YOLOv8 architecture.


Situp Pose Estimation Report

This section presents the summarized evaluation of pose estimation accuracy for the Situp class using the same pipeline of YOLOv8, DeepSORT, and GCN.

Evaluation Scope

  • Frames analyzed: 15 to 31
  • Views: Bottom, Left, Right, and Top

Visual Examples

Below are visual samples of the predictions and masked inputs for different frames and viewpoints. They illustrate the ability of the model to infer missing keypoints and predict accurately under occlusion.

  • Frame 28 (Bottom view):
    Frame 28 - Bottom

  • Frame 24 (Top view):
    Frame 24 - Top

  • Frame 24 (Left view):
    Frame 24 - Left

  • Frame 18 (Right view):
    Frame 18 - Right

Summary Statistics

View OKS (mean) MPJPE (mean) PCK@50 PCK@100 PCK@150
Bottom 1.000 5.253 px 1.000 1.000 1.000
Left 1.000 5.083 px 1.000 1.000 1.000
Right 1.000 4.913 px 1.000 1.000 1.000
Top 1.000 5.000 px 1.000 1.000 1.000

Conclusion

The system maintained exceptional prediction performance for the Situp class across all viewpoints. Every view achieved perfect OKS and PCK scores, while MPJPE remained lower than in the Push Up class. This indicates both high precision and stability of the model when estimating poses during sit-up actions.


Squats Pose Estimation Report

This section summarizes the performance of the pose estimation system for the Squats class using the same pipeline: YOLOv8, DeepSORT, and GCN.

Evaluation Scope

  • Frames analyzed: 15 to 31
  • Views: Bottom, Left, Right, and Top

Visual Examples

Below are visual samples of the predictions and masked inputs for different frames and viewpoints. They illustrate the ability of the model to infer missing keypoints and predict accurately under occlusion.

  • Frame 7 (Bottom view):
    Frame 7 - Bottom

  • Frame 7 (Top view):
    Frame 7 - Top

  • Frame 15 (Left view):
    Frame 15 - Left

  • Frame 7 (Right view):
    Frame 7 - Right

Summary Statistics

View OKS (mean) MPJPE (mean) PCK@50 PCK@100 PCK@150
Bottom 1.000 6.674 px 1.000 1.000 1.000
Left 1.000 8.195 px 1.000 1.000 1.000
Right 1.000 8.514 px 1.000 1.000 1.000
Top 1.000 10.445 px 1.000 1.000 1.000

Conclusion

Across all four viewpoints, the system maintained high reliability for Squats action recognition. Perfect OKS and PCK values confirm correct joint predictions, while the MPJPE values—though slightly higher in the Top view—remain within a strong performance range. The results show robust pose detection under squatting motion, affirming the model's adaptability and precision.


Deployment Instructions

This section provides the deployment procedure for both the CMS backend and the frontend web application.

Running Locally

To run the CMS locally with FastAPI:

uvicorn main:app --reload --host 0.0.0.0 --port 8000

Google Cloud Configuration

Ensure gcloud CLI is installed before proceeding.

Initialize GCP Project

gcloud config set project out-of-view-3d-pose-recovery
gcloud config set run/region asia-southeast1

GCS Service Account Setup

gcloud iam service-accounts create senpaiDev --display-name="Local Dev (Senpai) GCS Access"

gcloud projects add-iam-policy-binding out-of-view-3d-pose-recovery \
  --member="serviceAccount:senpaiDev@out-of-view-3d-pose-recovery.iam.gserviceaccount.com" \
  --role="roles/storage.admin"

gcloud iam service-accounts keys create ./gcs-key.json \
  --iam-account=senpaiDev@out-of-view-3d-pose-recovery.iam.gserviceaccount.com

set GOOGLE_APPLICATION_CREDENTIALS=gcs-key.json

Docker and GCP Deployment

Build and Test Locally

docker build -t gcr.io/out-of-view-3d-pose-recovery/occl3d-api .
docker run -p 8080:8080 gcr.io/out-of-view-3d-pose-recovery/occl3d-api

Authenticate and Push Image

gcloud auth configure-docker
docker push gcr.io/out-of-view-3d-pose-recovery/occl3d-api

Deploy to Cloud Run

gcloud run deploy occl3d-api \
  --image gcr.io/out-of-view-3d-pose-recovery/occl3d-api \
  --platform managed \
  --region asia-southeast1 \
  --service-account senpaiDev@out-of-view-3d-pose-recovery.iam.gserviceaccount.com \
  --allow-unauthenticated \
  --memory 2Gi \
  --set-env-vars ENV=production

Frontend Deployment (React)

In the /web folder:

npm install
npm run dev

Web Frontend UI Output

Below is an example of the frontend UI during video upload and processing. The system supports concurrent uploads, queuing, cancellation, and progress monitoring.

Web Upload Interface

Demo Video Output

You can view a sample result of the end-to-end system below:

Watch Demo

About

The pose prediction system is driven by a custom Graph Convolutional Network (GCN) model named PoseGCNFeat, which integrates both attention mechanisms and graph-based feature extraction...

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published