This project implements unsupervised machine learning using K-Means clustering to analyze complex network structures comprising 600+ interconnected nodes with spatial coordinates and simulated energy levels. The system demonstrates how computational analysis outperforms visual interpretation in dense, overlapping network topologies.
This assignment was completed as part of CMPG 313 - Artificial Intelligence course, exploring real-world applications such as Wireless Sensor Networks (WSN), Internet of Things (IoT) deployments, and distributed computing systems.
- Apply K-Means clustering using scikit-learn's KMeans function
- Analyze dense network topology with 600+ nodes
- Interpret clustering results using data-driven reasoning
- Visualize data using 2D network graphs and 3D surface plots
- Extract insights from node energy distributions
- Understand algorithm limitations and parameter sensitivity
| Technology | Purpose |
|---|---|
| Python 3.x | Primary programming language |
| Scikit-learn | K-Means clustering implementation |
| NetworkX | Graph theory operations and network topology generation |
| NumPy/Pandas | Data manipulation and statistical analysis |
| Matplotlib | 2D and 3D visualization with surface interpolation |
| SciPy | Spatial interpolation for energy surface mapping |
-
Stochastic Block Model Generation
- Community-based graph structure with 7 distinct blocks
- Intra-community connection probabilities: 0.27-0.32
- Inter-community connectivity: 0.05 (simulating realistic network density)
-
Spatial Compression Algorithm
- Spring-layout positioning (k=0.11, 300 iterations)
- Community-specific spatial compression (scale factor: 0.18)
- Stochastic jitter (±0.025) to prevent perfect separation
-
Energy Distribution Modeling
- Random energy values: 10-100 units (simulating battery levels)
- Decoupled from spatial positioning to test clustering efficacy
-
K-Means Clustering Engine
- Configurable cluster count (k=3, 7)
- 10 initializations for stable convergence
- Random state 42 for reproducibility
| Metric | Value |
|---|---|
| Number of Clusters | 7 |
| Total Nodes | 600 |
| Cluster Size Range | 75-95 nodes |
| Energy Range | 10-100 units |
| Centroids | 7 spatial centers |
Cluster Breakdown:
- Clusters formed based on spatial proximity
- Energy distribution varies significantly across clusters
- Critical low-energy nodes (<20 units) distributed across all clusters
| Metric | Value |
|---|---|
| Number of Clusters | 3 |
| Total Nodes | 600 |
| Cluster Size | ~200 nodes each |
| Centroids | 3 spatial centers |
Key Observations:
- Clusters merged to form larger spatial groupings
- Average energy values more balanced across clusters
- Centroids repositioned to reflect broader spatial regions
| Aspect | k=7 | k=3 |
|---|---|---|
| Granularity | High | Low |
| Cluster Definition | Specific | General |
| Energy Distribution | Variable per cluster | More balanced |
| Centroid Count | 7 | 3 |
| Interpretation | Detailed | High-level |
K-Means clusters solely based on spatial proximity, ignoring energy values. Low-energy nodes are distributed randomly across all clusters.
Changing k from 7 to 3 significantly alters:
- Cluster boundaries
- Centroids positions
- Group membership
Visual interpretation alone is insufficient for dense, overlapping networks. Data-driven analysis is essential.
Low-energy nodes (<20 units) can be identified across clusters for proactive maintenance.
- Python 3.8 or higher
- pip package manager
git clone https://github.com/KSDRILL/kmeans-clustering-network-analysis.git cd kmeans-clustering-network-analysis
python -m venv venv
venv\Scripts\activate
source venv/bin/activate
pip install numpy pandas matplotlib networkx scikit-learn scipy