Skip to content

stripathy1999/Keystroke-Anomaly-Detection-Enhanced

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Enhanced Anomaly Detection in Keystroke Dynamics Authentication

This repository is part of the CS266 - Topics in Information Security course project titled "Enhanced Anomaly Detection in Keystroke Dynamics Authentication". The project was submitted as part of the course requirements by the team members:

  • Rashmi Sonth
  • Sakshi Sanskruti Tripathy

The project focuses on analyzing different methodologies for anomaly detection in keystroke dynamics authentication, leveraging advanced machine learning and deep learning models.


File Structure

1. Dataset

  • demographics.csv:

    • Contains demographic information of participants.
    • Used to supplement analysis with demographic-based features.
  • free-text.csv:

    • Contains raw keystroke dynamics data.
    • Includes timing features such as DU.key1.key1, DD.key1.key2, etc.
    • This dataset is critical for building and testing machine learning models.
  • Dataset Source:

    • The dataset used in this project can be downloaded from Zenodo.

2. Notebook

  • models.ipynb:
    • The main Jupyter Notebook for training and evaluating models.
    • Implements machine learning and deep learning approaches (e.g., LSTM, CNN).
    • Important: Ensure the dataset paths (free-text.csv and demographics.csv) are correctly updated in this notebook based on your local environment.

3. Output

  • output.png:
    • Helps to evaluate the effectiveness of implemented techniques.

Instructions

  1. Update Dataset Paths:

    • The datasets are located in the dataset folder.
    • Ensure you update the dataset paths in the models.ipynb file wherever necessary:
      data = pd.read_csv('dataset/free-text.csv')
      demographics = pd.read_csv('dataset/demographics.csv')
  2. Run the Notebook:

    • Open models.ipynb in Jupyter Notebook or any compatible environment.
    • Execute the cells step-by-step to train and evaluate the models.

Requirements

  • Python 3.7+

  • Libraries:

    • tensorflow
    • numpy
    • pandas
    • matplotlib
    • seaborn
    • sklearn

    Quick Local Run

    1. Install dependencies:
    python -m pip install -r requirements.txt
    1. Run the main script (uses dataset/free-text.csv and dataset/demographics.csv):
    python models.py

    Note: the script will automatically fall back to the local dataset/ folder when not running in Google Colab.


    Full Setup & Run (Windows)

    Option A — Recommended (conda, fastest & avoids build issues)

    1. Create and activate an environment with conda:
    conda create -n keystroke python=3.11 -y
    conda activate keystroke
    1. Install binary packages from conda-forge:
    conda install -c conda-forge pandas numpy scipy scikit-learn matplotlib seaborn -y
    1. Run the script:
    python models.py

    Option B — Virtualenv (works but may require build toolchain)

    1. Create the venv (PowerShell):
    python -m venv .venv
    1. Use the venv's Python directly (no activation required):
    .venv\bin\python.exe -m pip install --upgrade pip setuptools wheel
    .venv\bin\python.exe -m pip install -r requirements.txt
    .venv\bin\python.exe models.py

    Note: on Windows some venvs use Scripts instead of bin (check .venv for either bin or Scripts).

    Troubleshooting

    • SSL / certificate verify failures while pip installing: This environment shows ssl.SSLCertVerificationError when pip attempts to download build backends (cmake, ninja). Common fixes:

      • Use the Conda approach above (conda provides prebuilt binaries). This is recommended on Windows.
      • Ensure system time is correct and corporate proxies are configured. If your network intercepts TLS, install the organization's CA into the OS certificate store.
      • As a last resort, set the environment variable PIP_DISABLE_PIP_VERSION_CHECK=1 and use --trusted-host for pip (not recommended for long-term use).
    • Build errors for numpy/pandas: Building numpy and pandas from source on Windows often requires a C/C++ toolchain (MSVC) and CMake. Use conda to avoid this.

About

This project explores anomaly detection in keystroke dynamics authentication using machine learning, deep learning, and adversarial (GAN-based) models. It evaluates how well different approaches distinguish genuine users from sophisticated spoofing attacks.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages