Project Overview
This project focuses on analyzing and clustering player data to group similar players based on their attributes. The clustering results can be used to identify player roles, scout talent, or form balanced teams. The dataset includes player characteristics such as height, weight, ratings, and performance metrics.
Objectives
Understand Player Attributes: Analyse features such as finishing, crossing, and overall rating.
Cluster Players: Group players into meaningful clusters using K-Means and Gaussian Mixture Model clustering techniques.
Interpret Clusters: Define player roles based on cluster characteristics.
Generate Insights: Provide actionable recommendations for scouting, player improvement, and team formation.
Project Workflow
- Data Preprocessing
Cleaning: Handle missing values by filling them with the mean or other appropriate values.
Standardisation: Normalise numerical features to ensure uniform scaling for clustering.
Feature Selection: Select relevant features (e.g., finishing, crossing, strength).
- Clustering Analysis
Algorithm Used: K-Means clustering to group players based on their attributes.
Steps:
Identify the optimal number of clusters using the Elbow Method.
Fit the K-Means algorithm and assign cluster labels.
Analyze cluster centroids to interpret group characteristics.
- Visualisation
Scatter Plots: Visualise player clusters using selected feature pairs (e.g., finishing vs. crossing).
Cluster Trends: Observe how clusters vary in terms of specific attributes.
- Insights
Provide actionable insights based on cluster definitions, such as:
The Jupyter notebook containing the analysis, clustering code, and visualizations.
clustered_player_roles.csv: The dataset with assigned cluster labels for each player.
Requirements:
Required libraries: pandas, numpy, matplotlib, seaborn, scikit-learn
Setup:
pip install pandas numpy matplotlib seaborn scikit-learn
Run the Code:
Open the Jupyter Notebook clustering_analysis.ipynb.
Execute the cells step-by-step to preprocess the data, apply clustering, and visualise results.