This project implements a behavioral biometric authentication system that identifies and verifies users based on their unique walking patterns (gait). Unlike traditional authentication methods (passwords, fingerprints), this system uses continuous sensor data from smartphone accelerometers and gyroscopes to authenticate users passively during normal walking.
๐ฏ Achievement: Successfully reduced Equal Error Rate (EER) to 3.43% with 92.68% accuracy using multi-sensor fusion and optimized feature selection.
- ๐ User Authentication: Classify 10 different users with high accuracy
- ๐ Cross-Day Robustness: Ensure models trained on one day work effectively on different days
- ๐ Real-World Feasibility: Handle varying conditions, walking speeds, and temporal variations
- โก Optimization: Achieve minimal Equal Error Rate (EER) through systematic optimization
โ
Multi-Sensor Fusion: Combines accelerometer and gyroscope data
โ
Rich Feature Extraction: 112 time and frequency domain features
โ
Deep Learning Classification: Multi-Layer Perceptron (MLP) with optimized architecture
โ
Gait-Specific Analysis: Filters walking segments from non-walking activities
โ
Cross-Day Validation: Rigorous testing across different days
โ
Systematic Optimization: Feature selection, architecture tuning, and data augmentation
- ๐ฅ Users: 10 participants
- ๐ Days: 2 collection sessions per user
- ๐ฑ Sensors: 6-axis IMU (3-axis accelerometer + 3-axis gyroscope)
- ๐ฒ Device: Smartphone (waist-mounted)
- โ๏ธ Sampling Rate: Variable (resampled to 30 Hz)
- ๐ถ Activity: Natural walking patterns
- ๐ Total Files: 20 CSV files (10 users ร 2 days)
- ๐ Segments per User/Day: ~71 segments (5-second windows)
- ๐ฏ Total Training Samples: ~1,420 segments
- ๐ข Features per Segment: 112 (combined variant)
project_root/
โโโ ๐ data_raw/ # Raw CSV sensor data (10 Users ร 2 Days)
โ โโโ U1NW_FD.csv # User 1, Full Day
โ โโโ U1NW_MD.csv # User 1, Morning Data
โ โโโ ...
โโโ ๐ data_segmented/ # Segmented .mat files (5s windows, 0% overlap)
โ โโโ User{X}_Day{Y}_segments.mat
โโโ ๐ data_segmented_50overlap/ # Segmented .mat files (5s windows, 50% overlap)
โโโ ๐ features/ # Extracted features organized by sensor type
โ โโโ accel/ # Accelerometer features only (45 features)
โ โโโ gyro/ # Gyroscope features only (45 features)
โ โโโ combined/ # Multi-modal features (112 features)
โโโ ๐ templates/ # Pre-processed, balanced datasets for training
โ โโโ accel/AllTemplates_accel.mat
โ โโโ gyro/AllTemplates_gyro.mat
โ โโโ combined/
โ โโโ AllTemplates_combined.mat
โ โโโ AllTemplates_combined_gait.mat # Gait-filtered data
โโโ ๐ models/ # Trained Neural Network models
โ โโโ combined/baseline_mlp.mat
โโโ ๐ results/ # Evaluation metrics and analysis
โ โโโ combined/ # Scenario-wise metrics (CSV)
โ โโโ optimization/ # Optimization experiment results
โ โโโ plots/ # Generated visualizations (PNG)
โ โโโ variants_comparison.csv # Cross-variant performance
โโโ ๐ scripts/ # MATLAB Source Code
โ โโโ Script1_FeatureExtraction.m # Stage 1-4: Preprocessing & Features
โ โโโ Script2_TemplateGeneration.m # Stage 5: Template Generation
โ โโโ Script3_ClassificationEvaluation.m # Stage 6-9: Training & Evaluation
โ โโโ Plots.m # Visualization generation
โโโ ๐ PROJECT_STRUCTURE.md # Detailed structure documentation
- MATLAB (R2021a or later recommended)
- Deep Learning Toolbox (for
patternnet,train) - Signal Processing Toolbox (for
pwelch,bandpower) - Statistics and Machine Learning Toolbox (for
cvpartition,fitcecoc)
Run the scripts in the following order to reproduce the full pipeline:
Script: scripts/Script1_FeatureExtraction.m
- Input:
data_raw/*.csv - Process:
- Loads raw sensor data.
- Resamples signals to a fixed 30 Hz.
- Segments data into 5-second windows (150 samples).
- Extracts Time-Domain Features (Mean, Std, RMS, Skewness, Kurtosis, etc.).
- Extracts Frequency-Domain Features (Bandpower, Energy, Spectral Entropy, etc.).
- Output:
data_segmented/,features/
Script: scripts/Script2_TemplateGeneration.m
- Input:
features/ - Process:
- Aggregates features from all users.
- Performs Gait Filtering (removes non-walking segments based on spectral energy).
- Checks for class imbalance and applies downsampling if necessary.
- Generates unified datasets for Accel, Gyro, and Combined variants.
- Output:
templates/
Script: scripts/Script3_ClassificationEvaluation.m
- Input:
templates/ - Process:
- Training: Trains a Multi-Class MLP (Feed-Forward Network).
- Evaluation: Calculates FAR, FRR, EER, and Accuracy for 3 scenarios.
- Optimization: Experiments with feature selection (Fisher Score), network architecture, and split ratios.
- Comparison: Compares Accel-only vs. Gyro-only vs. Combined performance.
- Output:
models/,results/
- Resampling: Linear interpolation to 30 Hz to handle irregular sensor sampling.
- Segmentation: Fixed-width sliding windows (5 seconds).
- Overlap: Supports 0% (baseline) and 50% (optimized) overlap.
The system extracts 70+ features per segment to capture unique gait characteristics:
| Domain | Features Extracted |
|---|---|
| Time Domain | Mean, Std Dev, Min, Max, RMS, Variance, Range, 75th Percentile, Skewness, Kurtosis, IQR, Vector Magnitude stats. |
| Frequency Domain | Bandpower (0.5-3Hz), Total Energy, Dominant Frequency, Spectral Centroid, Spectral Entropy, High-band Power (3-10Hz), Dominant Amplitude. |
- Classifier: Pattern Recognition Network (
patternnet) - Architecture: Multi-Layer Perceptron (MLP)
- Input Layer: ~112 features (Combined variant)
- Hidden Layers: [128, 64] neurons (Baseline)
- Output Layer: 10 neurons (Softmax for 10 users)
- Training Algo: Scaled Conjugate Gradient (
trainscg)
The system was tested on the challenging Cross-Day scenario (Train on Day 1, Test on Day 2) to evaluate real-world robustness.
| Variant | EER (%) | Accuracy (%) | FAR | FRR |
|---|---|---|---|---|
| Combined (Accel+Gyro) | 4.45% | 88.45% | 0.0136 | 0.1204 |
| Accelerometer Only | 8.85% | 79.72% | 0.0227 | 0.2030 |
| Gyroscope Only | 19.49% | 60.85% | 0.0442 | 0.3961 |
Observation: The Combined variant significantly outperforms single-sensor approaches, reducing the Equal Error Rate (EER) by nearly half compared to Accelerometer-only.
Several optimization levers were tested to improve performance beyond the baseline.
Reducing the feature set to the most discriminative features improved generalization.
| Features Kept | EER (%) | Accuracy (%) |
|---|---|---|
| Top-30 | 4.47% | 84.93% |
| Top-52 | 3.90% | 90.42% |
| Top-65 (Best) | 3.43% | 92.68% |
| All (112) | 4.45% | 88.45% |
Simpler architectures proved robust, but the baseline [128, 64] remained competitive.
- [128] (Single Layer): EER 4.20%, Acc 89.72%
- [128, 64] (Baseline): EER 3.90%, Acc 88.03%
The final optimized system utilizes the Combined sensor data with Top-65 ranked features and a [128, 64] MLP architecture.
- Equal Error Rate (EER):
3.43% - Classification Accuracy:
92.68% - False Acceptance Rate (FAR):
0.0093 - False Rejection Rate (FRR):
0.0817
This performance demonstrates that the system is highly effective at authenticating users even on different days, validating the stability of the extracted gait features.
The system is evaluated under three distinct scenarios to test robustness:
| Scenario | Description | Purpose |
|---|---|---|
| Scenario 1: Same-Day | Train on 70% of Day 1, Test on 30% of Day 1. | Baseline performance check. |
| Scenario 2: Cross-Day | Train on Day 1, Test on Day 2. | Real-world simulation. Tests temporal stability of gait. |
| Scenario 3: Combined | Train/Test on mixed data from both days. | Tests overall data variance handling. |
- FAR (False Acceptance Rate): Likelihood of an impostor being accepted.
- FRR (False Rejection Rate): Likelihood of a genuine user being rejected.
- EER (Equal Error Rate): The point where FAR = FRR (Lower is better).
- Accuracy: Overall correct classification percentage.
- Best Variant: Combined (Accelerometer + Gyroscope) yielded the lowest baseline EER of 4.45%.
- Optimization Success: Feature selection (Top-65) further reduced EER to 3.43%.
- Robustness: The system maintained >92% accuracy even when tested on data from a different day (Cross-Day scenario).
The project generates 8 comprehensive visualizations to document system performance. All plots are saved in results/plots/ directory.
๐ Plot 1 (EER Curve): Optimal threshold is ~0.45, achieving EER of 4.45%
๐ Plot 2 (Overlap): 50% overlap reduces EER compared to 0% overlap
๐ Plot 3 (Variants): Combined sensors reduce EER by 50% vs. accelerometer-only
๐ Plot 4 (Features): Top-65 features achieve lowest EER (3.43%), better than using all 112
๐ Plot 5 (Architecture): Single hidden layer [128] performs comparably to [128,64]
๐ Plot 6 (Scenarios): Cross-Day is hardest (EER 4.45%), Same-Day easiest (EER 3.05%)
๐ Plot 7 (Raw Signal): Shows typical gait periodicity in acceleration patterns
๐ Plot 8 (Gait Filtering): Gait-only segments improve authentication accuracy
- Clone/Download the project
- Open MATLAB and navigate to project root
- Run scripts in order:
% Step 1: Feature Extraction (takes ~5-10 minutes)
cd scripts
run('Script1_FeatureExtraction.m')
% Step 2: Template Generation (takes ~1 minute)
run('Script2_TemplateGeneration.m')
% Step 3: Classification & Evaluation (takes ~10-15 minutes)
run('Script3_ClassificationEvaluation.m')
% Step 4: Generate Visualizations (takes ~1 minute)
run('Plots.m')โ
data_segmented/ - 20 .mat files (segmented data)
โ
features/ - 60 .mat files (time + freq features for 3 variants)
โ
templates/ - 4 .mat files (training-ready datasets)
โ
models/ - 1 .mat file (trained baseline MLP)
โ
results/ - 10+ CSV files + 8 PNG plots
- Multi-Modal Fusion: Combining accelerometer and gyroscope reduced EER by 50%
- Feature Selection: Less is more - 65 features outperformed 112 features
- Gait Filtering: Removing non-walking segments improved cross-day robustness
- Simple Architecture: [128, 64] MLP proved sufficient (no need for deep networks)
- Proper Evaluation: Cross-day testing revealed true real-world performance
- Irregular Sampling: Raw sensor data required careful resampling
- Class Imbalance: Minor variations required downsampling
- Temporal Drift: Cross-day performance naturally lower than same-day
- Feature Redundancy: Many extracted features were highly correlated
- Overfitting Risk: Larger networks performed worse due to limited training data
- More Users: Expand dataset to 20-50 users for better generalization
- Longer Duration: Collect data over 1-2 weeks to test long-term stability
- Real-Time Processing: Implement sliding window inference
- Mobile Deployment: Port model to TensorFlow Lite for smartphone apps
- Additional Activities: Test robustness with running, stairs, sitting
- Transfer Learning: Pre-train on large public gait datasets
We welcome contributions from the community! Whether you're fixing bugs, adding features, improving documentation, or suggesting enhancements, your help is appreciated.
- ๐ Report Bugs: Open an issue with detailed information
- โจ Suggest Features: Share your ideas for improvements
- ๐ป Submit Code: Fork, create a branch, and submit a pull request
- ๐ Improve Docs: Help make documentation clearer and more comprehensive
Please read our CONTRIBUTING.md for detailed guidelines on:
- Code style and standards
- Development workflow
- Testing requirements
- Pull request process
- Community guidelines
This is a research project. Feel free to use the code and methodology for educational purposes.
Project Developer
Date: November 2025
For questions about the methodology or implementation details, please refer to the comprehensive code comments in the MATLAB scripts.
Note: This project demonstrates the feasibility of behavioral biometric authentication using smartphone sensors and achieves state-of-the-art performance (EER < 5%) suitable for practical deployment.
For detailed performance graphs and visual analysis, check the results/plots/ directory.







