The pose prediction system is driven by a custom Graph Convolutional Network (GCN) model named PoseGCNFeat, which integrates both attention mechanisms and graph-based feature extraction. The architecture is initialized as follows:
PoseGCNFeat(
(conv1): GATConv(8, 64, heads=1)
(conv2): GATConv(64, 64, heads=1)
(attn1): FeatureAttention(
(attn): Sequential(
(0): Linear(in_features=64, out_features=64, bias=True)
(1): Tanh()
(2): Linear(in_features=64, out_features=1, bias=True)
(3): Sigmoid()
)
)
(conv3): GCNConv(64, 2)
(dropout): Dropout(p=0.3, inplace=False)
)The model is loaded and prepared for evaluation using the following code:
model = PoseGCNFeat()
model.load_state_dict(torch.load(BEST_GCN_MODEL_WEIGHTS))
model.eval()This is a GCN-based model for keypoint completion. The network operates per joint (node) with the following structure:
Inputs per node: [x, y, dx, dy, class_id, joint_id, track_id, visibility] → 8 features
Outputs per node: [x, y] → 2D coordinates only
This architecture combines spatial attention with graph convolutions to enhance the prediction accuracy of partially observed or occluded keypoints during physical activities such as push-ups, sit-ups, and squats.
This report presents the summarized evaluation of pose estimation accuracy for the Push Up class using a pipeline that combines YOLOv8 pose estimation, DeepSORT tracking, and a Graph Convolutional Network (GCN) for keypoint prediction.
- Frames analyzed: 15 to 31
- Views: Bottom, Left, Right, and Top
- Model components:
- YOLOv8: Keypoint detection
- DeepSORT: Person tracking
- GCN: Keypoint refinement
Below are visual samples of the predictions and masked inputs for different frames and viewpoints. They illustrate the ability of the model to infer missing keypoints and predict accurately under occlusion.
| Metric | Description |
|---|---|
| OKS (Object Keypoint Similarity) | Measures similarity between predicted and ground truth keypoints. Ranges from 0 to 1. |
| MPJPE (Mean Per Joint Position Error) | Average Euclidean distance error per joint, in pixels. |
| PCK@X (Percentage of Correct Keypoints) | Fraction of joints correctly predicted within X-pixel radius. |
| View | OKS (mean) | MPJPE (mean) | PCK@50 | PCK@100 | PCK@150 |
|---|---|---|---|---|---|
| Bottom | 1.000 | 9.278 px | 1.000 | 1.000 | 1.000 |
| Left | 1.000 | 7.575 px | 1.000 | 1.000 | 1.000 |
| Right | 1.000 | 7.696 px | 1.000 | 1.000 | 1.000 |
| Top | 1.000 | 8.402 px | 1.000 | 1.000 | 1.000 |
The pose estimation system delivers highly accurate and consistent keypoint predictions across all views for the Push Up class. All views achieved perfect OKS and PCK scores, and MPJPE values remained well within acceptable limits, demonstrating the robustness of the combined GCN + DeepSORT + YOLOv8 architecture.
This section presents the summarized evaluation of pose estimation accuracy for the Situp class using the same pipeline of YOLOv8, DeepSORT, and GCN.
- Frames analyzed: 15 to 31
- Views: Bottom, Left, Right, and Top
Below are visual samples of the predictions and masked inputs for different frames and viewpoints. They illustrate the ability of the model to infer missing keypoints and predict accurately under occlusion.
| View | OKS (mean) | MPJPE (mean) | PCK@50 | PCK@100 | PCK@150 |
|---|---|---|---|---|---|
| Bottom | 1.000 | 5.253 px | 1.000 | 1.000 | 1.000 |
| Left | 1.000 | 5.083 px | 1.000 | 1.000 | 1.000 |
| Right | 1.000 | 4.913 px | 1.000 | 1.000 | 1.000 |
| Top | 1.000 | 5.000 px | 1.000 | 1.000 | 1.000 |
The system maintained exceptional prediction performance for the Situp class across all viewpoints. Every view achieved perfect OKS and PCK scores, while MPJPE remained lower than in the Push Up class. This indicates both high precision and stability of the model when estimating poses during sit-up actions.
This section summarizes the performance of the pose estimation system for the Squats class using the same pipeline: YOLOv8, DeepSORT, and GCN.
- Frames analyzed: 15 to 31
- Views: Bottom, Left, Right, and Top
Below are visual samples of the predictions and masked inputs for different frames and viewpoints. They illustrate the ability of the model to infer missing keypoints and predict accurately under occlusion.
| View | OKS (mean) | MPJPE (mean) | PCK@50 | PCK@100 | PCK@150 |
|---|---|---|---|---|---|
| Bottom | 1.000 | 6.674 px | 1.000 | 1.000 | 1.000 |
| Left | 1.000 | 8.195 px | 1.000 | 1.000 | 1.000 |
| Right | 1.000 | 8.514 px | 1.000 | 1.000 | 1.000 |
| Top | 1.000 | 10.445 px | 1.000 | 1.000 | 1.000 |
Across all four viewpoints, the system maintained high reliability for Squats action recognition. Perfect OKS and PCK values confirm correct joint predictions, while the MPJPE values—though slightly higher in the Top view—remain within a strong performance range. The results show robust pose detection under squatting motion, affirming the model's adaptability and precision.
This section provides the deployment procedure for both the CMS backend and the frontend web application.
To run the CMS locally with FastAPI:
uvicorn main:app --reload --host 0.0.0.0 --port 8000Ensure gcloud CLI is installed before proceeding.
gcloud config set project out-of-view-3d-pose-recovery
gcloud config set run/region asia-southeast1gcloud iam service-accounts create senpaiDev --display-name="Local Dev (Senpai) GCS Access"
gcloud projects add-iam-policy-binding out-of-view-3d-pose-recovery \
--member="serviceAccount:senpaiDev@out-of-view-3d-pose-recovery.iam.gserviceaccount.com" \
--role="roles/storage.admin"
gcloud iam service-accounts keys create ./gcs-key.json \
--iam-account=senpaiDev@out-of-view-3d-pose-recovery.iam.gserviceaccount.com
set GOOGLE_APPLICATION_CREDENTIALS=gcs-key.jsondocker build -t gcr.io/out-of-view-3d-pose-recovery/occl3d-api .
docker run -p 8080:8080 gcr.io/out-of-view-3d-pose-recovery/occl3d-apigcloud auth configure-docker
docker push gcr.io/out-of-view-3d-pose-recovery/occl3d-apigcloud run deploy occl3d-api \
--image gcr.io/out-of-view-3d-pose-recovery/occl3d-api \
--platform managed \
--region asia-southeast1 \
--service-account senpaiDev@out-of-view-3d-pose-recovery.iam.gserviceaccount.com \
--allow-unauthenticated \
--memory 2Gi \
--set-env-vars ENV=productionIn the /web folder:
npm install
npm run devBelow is an example of the frontend UI during video upload and processing. The system supports concurrent uploads, queuing, cancellation, and progress monitoring.
You can view a sample result of the end-to-end system below:












