व्यायाम · Vyāyāma

An on-device AI exercise coach — private, offline, real-time.

Your phone watches your form, counts every rep, scores it 0–100, and talks you through the set — while the camera feed never leaves the device.

Features · Why it's different · How it works · Architecture · Tested · Build and run · Team

Qualcomm's reference app renders 17 pose keypoints on the NPU — and stops there. Vyāyāma treats those keypoints as the starting line: it recognises the exercise on its own, counts reps that survive a real workout, scores every rep, and speaks the count and a correction out loud — all on the Snapdragon Hexagon NPU, with no network, no backend, no cloud.

At a glance


Platform	Android · Java 17 · built on the Qualcomm QIDK base (package `com.qc.posedetectionYoloNAS`)
Inference	YOLO-NAS person detector → HRNet 17-keypoint pose · INT8 `.dlc` on the Snapdragon Hexagon NPU via SNPE
Runtime switch	NPU / GPU / CPU, switchable live from the in-app menu
Exercises	7 — auto-recognised, or manually pinned so they can't be misread
Coaching	rep count · 0–100 form score · live colour-coded cue · spoken voice on every rep
Personalisation	per-user range calibration · offline profiles, personal bests, streaks
Storage	local `SharedPreferences` behind a write-back RAM buffer — zero per-rep disk writes
Network	none. The manifest declares no `INTERNET` permission
Tested	123 / 123 assertions, pure-Java harness, no device required

Features


Sense	YOLO-NAS → HRNet, 17 keypoints, INT8 on SNPE / Hexagon NPU, with a one-tap GPU/CPU fallback.
Recognise	Auto-detects 7 exercises — squat, push-up, bicep curl, jumping jack, shoulder press, sit-up, plank.
Count and score	Real-time rep counting, a 0–100 form score every rep, and a live, colour-coded coaching cue.
Voice coach	Speaks every rep out loud — the count and a per-rep cue ("three… go a little deeper… ten, great work") in a soft on-device voice. Built eyes-off, for when you're across the room and never looking at the screen.
Manual mode	Pin one exercise so it can never be misread — ideal for a noisy demo floor or an unusual camera angle.
Offline profiles	Personal bests, lifetime totals, daily streaks, a PB reward banner, and daily reminders via local notifications.
Coach Vision	A live overlay of the exact signals the engine is sensing — full transparency, nothing hidden.
Private by construction	No `INTERNET` permission. Camera frames are processed and discarded; nothing ever leaves the device.

Why it's different

Pose estimation is the easy 20%. The hard 80% is turning a noisy stream of keypoints into something that counts correctly for a real person, in a real room, at a real camera angle — and says something useful while doing it.

	Qualcomm reference app	Vyāyāma
17 pose keypoints on the NPU	✓	✓
Recognises which exercise you're doing	—	✓ auto, 7 moves
Counts reps	—	✓ robust state machine
Scores form 0–100	—	✓ every rep
Speaks coaching out loud	—	✓ every rep, offline TTS
Adapts to your body and camera angle	—	✓ per-user calibration
Profiles, personal bests, streaks	—	✓ fully offline
Runs with no network	✓	✓ (no `INTERNET` permission)

How it works

The camera streams YUV frames at 30 fps. Every frame passes through five stages — all on-device, all allocation-free:

Detect and pose — YOLO-NAS locates the person, HRNet regresses 17 keypoints. INT8, on the Hexagon NPU.
Stabilise — a One-Euro filter per keypoint smooths jitter without adding lag; teleport rejection drops physically impossible jumps; a brief gap-hold rides out short occlusions instead of flickering to zero.
Extract features — 13 biomechanical signals (joint angles, limb ratios, a viewpoint-stable hip-drop, vertical travel and cadence) are computed into pre-allocated buffers.
Recognise — a sticky, self-correcting classifier locks onto the current exercise from those signals, or honours a manually pinned one.
Count, score, coach — a two-threshold rep state machine with adaptive range calibration counts the rep; a form rule set scores it 0–100 and picks a cue; the voice layer speaks the count and the correction.

The rep state machine — why counting actually holds up

A naive "angle crosses a line" counter falls apart on real reps. This one is built to survive them:

Two thresholds (enter-top ≈ 0.15, enter-bottom ≈ 0.85 of the normalised range) with hysteresis, so a trembling joint at the turnaround can't double-count.
Adaptive per-user calibration — it learns your actual top and bottom from the first rep, so a limited range of motion or an unusual camera distance still counts correctly.
Peak/valley completion — reps where you don't quite lock out at the top still register, instead of deadlocking the way a strict threshold would.
Partial- and too-fast-rejection — half-reps and twitches below a minimum duration are ignored.
Multi-phase advance per frame — fast reps at a low frame rate are never silently dropped.
NaN-freeze — a missing joint freezes the rep state rather than corrupting the count.

Recognition that doesn't flicker

A sticky lock holds the current exercise once confident, instead of relabelling every frame.
It is fed the smoothed signal, not raw keypoints, so noise doesn't trigger spurious switches.
Positive-evidence gates keep distinct moves distinct — a sit-up's trunk fold can never be mistaken for a bicep curl or a shoulder press.
A wrong first guess self-corrects in about 0.6 s; a manually pinned exercise overrides recognition entirely.

A voice built for eyes-off training

Speaks the count and a short correction on every rep, so you never have to look at the screen.
Rotates its phrasing so it never sounds robotic; praises good streaks and calls out every tenth rep.
A faulty rep triggers a targeted cue — depth, back sag, swing, range, tempo, posture, lockout, or symmetry.
Fully deterministic (no random, no clock) and synthesised by the device's offline TextToSpeech, with a soft default voice and a bottom-left toggle plus settings.

Engineering highlights

The decisions that separate a demo that works on stage from one that works in a living room.

Camera-angle robustness — a viewpoint-stable hip-drop signal counts foreshortened, front-on squats that a knee angle alone would miss.
Real-time on a phone — zero heap allocation per camera frame (pre-allocated ring buffers), so the garbage collector never stutters mid-rep; INT8 on the NPU keeps it fast and battery-light.
Storage treated like memory — profile stats load into a write-back RAM buffer on open and flush to flash once, on close. Zero per-rep disk writes means less flash wear and lighter battery.
Proven, not hand-waved — the entire engine (VyayamaCoach + VoiceCoach) is pure Java with zero Android imports, validated by a 123-assertion offline harness that runs in milliseconds, with no device, NPU, or camera.

Architecture

 Camera           ┌────────────  100% ON-DEVICE · NO INTERNET  ────────────┐           Outputs
 YUV · 30 fps  ─▶ │  YOLO-NAS → HRNet   →  One-Euro filter  →  VyāyamaCoach  │ ─▶  Voice     · on-device TTS
                  │  17 keypoints, INT8    + 13 biomech feats   recognise·rep│     HUD       · reps, form, cue
                  │  on the Hexagon NPU                         FSM·form 0–100│     Profiles  · PB, streak
                  └─────────────────────────────────────────────────────────┘
                       ↻  per-user calibration + offline profiles personalise every session

The brain — VyayamaCoach (recognition, reps, form) and VoiceCoach (cadence) — is pure Java with zero Android imports, so it compiles and runs under a javac harness with no device, no NPU, and no camera. That is exactly what makes it exhaustively unit-testable.

Exercises

Exercise	Primary signal	Example spoken cue
Squat	knee flexion + viewpoint-stable hip-drop	"go a little deeper"
Push-up	elbow flexion + torso line	"keep your back flat"
Bicep curl	elbow flexion, upper arm held still	"don't swing — control it"
Jumping jack	arm/leg open-close cadence	"full range, all the way up"
Shoulder press	elbow extension overhead, wrists above shoulders	"lock out at the top"
Sit-up	hip and trunk flexion	"all the way up"
Plank	isometric hold, torso line (timer, not reps)	"hold — hips level"

Tech stack

Layer	Choice
Inference	Qualcomm SNPE · Hexagon NPU · INT8 `.dlc` (GPU/CPU fallback)
Models	YOLO-NAS (person detection) · HRNet (17-keypoint pose)
App	Android · Java 17 · Camera2/CameraX YUV pipeline
Smoothing	One-Euro filter, teleport + dropout rejection
Voice	Android TextToSpeech, offline, deterministic cadence layer
Storage	`SharedPreferences` behind a write-back buffer
Base	Qualcomm QIDK · `VisionSolution4-PoseEstimation`
Verification	pure-Java `javac` harness, 123 assertions

Tested

The rep and voice engine is pure Java, so the full suite runs in seconds — no device, no emulator:

cd android-device-app/tools/coach_harness
javac -d out \
  ../../app/src/main/java/com/qc/posedetectionYoloNAS/VyayamaCoach.java \
  ../../app/src/main/java/com/qc/posedetectionYoloNAS/KeypointFilter.java \
  ../../app/src/main/java/com/qc/posedetectionYoloNAS/VoiceCoach.java \
  CoachHarness.java
java -cp out com.qc.posedetectionYoloNAS.CoachHarness
# →  PASSED 123 / 123

Coverage: angle math; the rep FSM under jitter, noise, 2× scale, translation, and joint dropout; partial- and too-fast-rejection; all 7 exercises; adaptive ROM; manual mode; the sit-up-vs-shoulder-press fix; and the voice coach (count plus per-rep comment, milestones, praise, and silence when off).

Device-free Kotlin reference suite:

cd android && ./gradlew :app:testDebugUnitTest

Build and run

Prerequisites — Android Studio (recent), a Snapdragon device, and the device-matched SNPE .dlc models plus Hexagon runtime libraries (see docs/device-runbook.md).

Open android-device-app/ in Android Studio and let Gradle sync.
Drop the .dlc models and Hexagon runtime libs into the locations described in docs/.
Build :app and install on the device.
Pose runs on the Hexagon NPU by default — switch to GPU / CPU live from the in-app menu. The voice coach uses the device's built-in TextToSpeech: offline, with no download on most devices.

No .dlc? The pure-Java engine and its full test suite run on any machine with a JDK — see Tested.

Project structure

android-device-app/        the shipped app  (Qualcomm QIDK · package com.qc.posedetectionYoloNAS)
  app/src/main/java/…/        VyayamaCoach · VoiceCoach · VoicePlayer · VoicePrefs · VoiceSettingsDialog
                              ModePickerDialog · CameraFragment · FragmentRender · ProfileStore · Reminder*
  app/src/main/res/           Volt theme · drawables · layouts
  tools/coach_harness/        CoachHarness.java — the 123-assertion pure-Java test suite
android/                   device-free Kotlin reference (intelligence layer, unit tests, mocks)
ml/                        optional learned classifier + Python⇄Kotlin feature-parity contract
docs/                      bible.md (design source of truth), runbooks, pitch deck, report
tools/                     bible PDF builder · threshold tuner

Privacy by construction

The manifest declares no INTERNET permission, so no code path can reach the network — no backend, no analytics, no cloud. Camera frames are processed on the NPU and discarded; only your counts, personal bests, and streak live locally in SharedPreferences, and the coaching voice is synthesised on-device.

Roadmap

Real per-joint confidence sourced from the HRNet heatmaps (currently visibility + motion-consistency).
A YOLO11-pose model option alongside the YOLO-NAS → HRNet pipeline.
More movements, and richer per-exercise form rubrics.

Team

Team Vyāyāma — Rayyan Shaikh (lead) · Ashitha Patil · Vaibhav Rathod R.V. College of Engineering (RVCE), Bengaluru Hack4SoC 3.0 · On-Device / Edge AI (Qualcomm)

Vyāyāma (व्यायाम) — Sanskrit for "exercise." · Your phone already has the hardware; we just turned it into a coach.

Form guidance, not medical advice.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
.github/workflows		.github/workflows
android-device-app		android-device-app
android		android
docs		docs
ml		ml
tools		tools
.gitattributes		.gitattributes
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

व्यायाम · Vyāyāma

An on-device AI exercise coach — private, offline, real-time.

At a glance

Features

Why it's different

How it works

Engineering highlights

Architecture

Exercises

Tech stack

Tested

Build and run

Project structure

Privacy by construction

Roadmap

Team

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

व्यायाम · Vyāyāma

An on-device AI exercise coach — private, offline, real-time.

At a glance

Features

Why it's different

How it works

Engineering highlights

Architecture

Exercises

Tech stack

Tested

Build and run

Project structure

Privacy by construction

Roadmap

Team

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages