As a local living near Melbourne CBD, Maverick relies on active travel, like walking, and public transport, like trams, to get around to different places he wants to go. One morning in January, Maverick prepared to travel to his workplace, expecting to get to work before 9:00 AM, but there was a sudden heatwave, causing the tram that he usually catches to be unable to follow its designated schedule and creating a delay in his schedule. Although there was a replacement bus for this emergency, only a limited number of people could board this vehicle, which further delayed his schedule. Because of this sudden extreme weather, travel conditions become less reliable and difficult to predict.
+Maverick wants to have access to a system that could predict how climate conditions over time can affect urban pedestrian movement. So that he could better plan his trip, allowing him to leave earlier in anticipation of sudden extreme weather change during a particular timeframe, or choose a different mode of transport, like an Uber. This allows more support in making informed decisions when travelling during extreme weather events.
+At the end of this use case, you will:
+-
+
- Learn how to source and combine multiple public datasets. +
- Understand how to clean and align time-series data at an hourly level for modelling. +
- Explore how climate variables, such as temperature, humidity, pressure, and wind, relate to pedestrian counts. +
- Apply feature engineering techniques to create meaningful predictors from weather and mobility time-series data. +
- Build a deep learning forecasting model to predict pedestrian demand. +
- Perform model optimisation like hyperparameter tuning to improve forecasting performance. +
- Evaluate model performance and interpret results for climate adaptation planning. +
Urban systems are often affected by changing climate conditions, but these effects are not always easy to capture with simple forecasting methods. One clear example is pedestrian movement, where changing weather conditions can affect how many people move through the city over time.
+This use case focuses on predicting pedestrian activity in the City of Melbourne using hourly climate observations, which keeps the project closely aligned with the goal of modelling how climate factors influence an urban system.
+In this use case, pedestrian counts are aggregated into hourly city-level totals and merged with hourly microclimate observations for Melbourne. A deep learning model can then be trained to predict pedestrian demand based on time, recent demand history, and recent climate conditions.
+The datasets used in this project are the "Pedestrian Counting System (counts per hour)", the "Pedestrian Counting System - Sensor Locations" dataset for supporting location metadata, and the "Microclimate Sensor Readings" dataset from the City of Melbourne website.
+1. Importing The Libraries¶
This section is to show what libraries were used for this use case, with each imported library supporting a specific part of the pipeline. These libraries are necessary for doing data handling, time-series analysis, visualisation, feature engineering, optimisation, and deep learning [1] [2] [3] [4] [5]. These were added at the beginning to ensure the workflow is organised [6]. A random seed was set with a student ID to allow reproducibility of the outputs in this notebook [7].
+The following libraries support the deep learning pipeline for this section. TensorFlow and Keras are used for building and training the recurrent neural network models [3]. The os and random libraries are imported to support full reproducibility across all system-level random operations [55].
+# Libraries for this project.
+import numpy as np
+import pandas as pd
+import matplotlib.pyplot as plt
+from matplotlib.lines import Line2D
+from matplotlib.patches import Patch
+from statsmodels.graphics.tsaplots import plot_acf
+from statsmodels.tsa.seasonal import seasonal_decompose
+from statsmodels.tsa.stattools import adfuller
+from sklearn.preprocessing import StandardScaler
+import os
+import random
+
+# Deep Learning
+import tensorflow as tf
+from tensorflow.keras.models import Sequential
+from tensorflow.keras.layers import LSTM, GRU, Dense, Dropout
+from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau
+2. Importing The Datasets¶
This section is necessary for importing multiple public datasets from the City of Melbourne, before any cleaning, merging and modelling in later stages can happen. These datasets will be accessed through the City of Melbourne Open Data API v2.1, to allow the notebook to be directly used upon download [8].
+By using a shared BASE_URL and a dictionary of dataset identifiers, this allows for removing and adding datasets more easily [8] [9] [10] [11]. The get_csv_url() function is especially useful because it standardises the dataset access method and allows the same logic to be reused across all three datasets [8]. The ROW_LIMIT parameter was added to allow experimentation on smaller samples before scaling to the full dataset [8]. Lastly, the datasets are accessed through the Melbourne Open Data API v2.1, and no visible API key is exposed in the code [8].
+# Store the API base path accessed via API v2.1.
+BASE_URL = (
+ "https://data.melbourne.vic.gov.au"
+ "/api/explore/v2.1/catalog/datasets"
+)
+
+# Store the dataset identifiers.
+DATASETS = {
+ "sensor_locations":
+ "pedestrian-counting-system-sensor-locations",
+ "pedestrian_counts":
+ "pedestrian-counting-system-monthly-counts-per-hour",
+ "microclimate":
+ "microclimate-sensors-data",
+}
+
+# Set the number of rows to retrieve for experiments.
+# Change to "None" when full datasets are needed.
+# Change to an integer for experimental purposes.
+ROW_LIMIT = None
+
+def get_csv_url(dataset_id, row_limit=None):
+ # Build the base CSV export URL.
+ url = (
+ f"{BASE_URL}/{dataset_id}/exports/csv"
+ f"?delimiter=,&with_bom=true"
+ )
+
+ # Add a row limit if one is provided.
+ if row_limit is not None:
+ url += f"&limit={row_limit}"
+
+ return url
+2.1 Pedestrian Counting System - Sensor Locations¶
This dataset was included to provide contextual information about the physical pedestrian counting network, even though it wasn't used in the later steps for modelling. The sensor metadata helps explain where the mobility data originates from and what the coverage of the system looks like [9].
+-
+
location_id: unique identifier for each pedestrian counting location.
+sensor_description: human-readable description of the site, like street names.
+sensor_name: short internal sensor code.
+installation_date: date the sensor was installed.
+note: extra comments or metadata about the sensor.
+location_type: type of location.
+status: operational status of the sensor.
+direction_1anddirection_2: the two movement directions captured by the counter.
+latitudeandlongitude: geographic coordinates of the sensor.
+location: combined coordinate string.
+
# Load the sensor locations dataset.
+sensor_locations_df = pd.read_csv(
+ get_csv_url(
+ DATASETS["sensor_locations"],
+ row_limit=ROW_LIMIT
+ ),
+ encoding="utf-8-sig"
+)
+
+# Preview the dataframe.
+print(sensor_locations_df.head())
+location_id sensor_description sensor_name installation_date \ +0 3 Melbourne Central Swa295_T 2009-03-25 +1 5 Princes Bridge PriNW_T 2009-03-26 +2 9 Southern Cross Station Col700_T 2009-03-23 +3 12 New Quay NewQ_T 2009-01-21 +4 14 Sandridge Bridge SanBri_T 2009-03-24 + + note location_type status \ +0 NaN Outdoor A +1 Replace with: 00:6e:02:01:9e:54 Outdoor A +2 NaN Outdoor A +3 NaN Outdoor A +4 Sensor relocated to sensor ID 25 on 2/10/2019 Outdoor A + + direction_1 direction_2 latitude longitude location +0 North South -37.811015 144.964295 -37.81101524, 144.96429485 +1 North South -37.818742 144.967877 -37.81874249, 144.96787656 +2 East West -37.819830 144.951026 -37.81982992, 144.95102555 +3 East West -37.814580 144.942924 -37.81457988, 144.94292398 +4 North South -37.820112 144.962919 -37.82011242, 144.96291897 ++
2.2 Pedestrian Counting System (counts per hour)¶
This is the target dataset for the use case because the final prediction task is to forecast pedestrian demand. The pedestrian dataset contains the observed mobility outcome that the model aims to learn [10].
+-
+
id: record identifier.
+location_id: identifier linking the observation to a specific sensor location.
+sensing_date: date of observation.
+hourday: hour of day from 0 to 23.
+direction_1anddirection_2: directional pedestrian counts.
+pedestriancount: total pedestrian count for that record.
+sensor_name: short sensor code.
+location: coordinate string for the sensor.
+
# Load the pedestrian counts dataset.
+pedestrian_counts_df = pd.read_csv(
+ get_csv_url(
+ DATASETS["pedestrian_counts"],
+ row_limit=ROW_LIMIT
+ ),
+ encoding="utf-8-sig"
+)
+
+# Preview the dataframe.
+print(pedestrian_counts_df.head())
+id location_id sensing_date hourday direction_1 direction_2 \ +0 70220251008 70 2025-10-08 2 1 0 +1 31220260507 3 2026-05-07 12 872 1170 +2 591520241018 59 2024-10-18 15 430 628 +3 141020260106 14 2026-01-06 10 240 75 +4 28820241203 28 2024-12-03 8 537 262 + + pedestriancount sensor_name location +0 1 Errol20_T -37.80456984, 144.94946228 +1 2042 Swa295_T -37.81101524, 144.96429485 +2 1058 RMIT_T -37.80825648, 144.96304859 +3 315 SanBri_T -37.82011242, 144.96291897 +4 799 VAC_T -37.82129925, 144.96879309 ++
2.3 Microclimate sensors data¶
This dataset provides the microclimate data that the use case requires to understand how climate conditions affect urban pedestrian movement, more specifically, the input features. It basically provides the explanatory environmental variables to connect weather conditions to mobility demand [11].
+-
+
device_id: identifier for each microclimate device.
+received_at: timestamp when the reading was recorded.
+sensorlocation: descriptive location of the microclimate sensor.
+latlong: coordinate string.
+minimumwinddirection,averagewinddirection,maximumwinddirection: wind direction measurements.
+minimumwindspeed,averagewindspeed,gustwindspeed: wind speed measurements.
+airtemperature: air temperature reading.
+relativehumidity: humidity level.
+atmosphericpressure: atmospheric pressure.
+pm25andpm10: particulate matter measurements.
+noise: noise level.
+
# Load the microclimate dataset.
+microclimate_df = pd.read_csv(
+ get_csv_url(
+ DATASETS["microclimate"],
+ row_limit=ROW_LIMIT
+ ),
+ encoding="utf-8-sig"
+)
+
+# Preview the dataframe.
+print(microclimate_df.head())
+device_id received_at \ +0 ICTMicroclimate-08 2026-04-28T15:25:15+00:00 +1 ICTMicroclimate-08 2026-04-28T04:38:50+00:00 +2 ICTMicroclimate-08 2026-04-28T05:08:52+00:00 +3 ICTMicroclimate-10 2026-04-28T10:19:41+00:00 +4 ICTMicroclimate-08 2026-04-28T13:55:03+00:00 + + sensorlocation \ +0 Swanston St - Tram Stop 13 adjacent Federation... +1 Swanston St - Tram Stop 13 adjacent Federation... +2 Swanston St - Tram Stop 13 adjacent Federation... +3 1 Treasury Place +4 Swanston St - Tram Stop 13 adjacent Federation... + + latlong minimumwinddirection averagewinddirection \ +0 -37.8184515, 144.9678474 0.0 220.0 +1 -37.8184515, 144.9678474 0.0 283.0 +2 -37.8184515, 144.9678474 0.0 310.0 +3 -37.8128595, 144.9745395 277.0 324.0 +4 -37.8184515, 144.9678474 0.0 281.0 + + maximumwinddirection minimumwindspeed averagewindspeed gustwindspeed \ +0 314.0 0.0 0.5 2.5 +1 353.0 0.0 0.9 2.6 +2 314.0 0.0 0.2 1.9 +3 352.0 0.3 0.8 1.3 +4 354.0 0.0 0.3 2.5 + + airtemperature relativehumidity atmosphericpressure pm25 pm10 noise +0 16.7 79.9 1026.1 10.0 12.0 58.7 +1 21.9 61.5 1021.5 27.0 32.0 75.6 +2 21.6 61.1 1021.3 26.0 29.0 69.2 +3 19.7 57.4 1018.4 39.0 49.0 61.9 +4 17.3 76.2 1025.5 10.0 12.0 79.4 ++
3. Initial Inspection Of The Datasets¶
This section is necessary to understand the size, structure, completeness and formatting of public datasets since they often differ. Public datasets can have different structures, missing values, data types, date formats, and identifier systems, so checking them early helps identify potential issues in the workflow [12]. Before trying to clean and merge the datasets, understanding what each of the datasets contains and how they can be combined is important [12].
+3.1 Checking Number Of Rows/Columns¶
Checking the shape is important because it shows the scale and complexity of each dataset. This helps determine memory demands, cleaning strategy, and whether the data volume is sufficient for later modelling [13].
+From these results, the pedestrian and microclimate datasets are large enough for later time-series modelling. The sensor locations dataset is much smaller because it only contains metadata about the sensor network, rather than repeated hourly observations. The volume for each task varies, but from the inspection of the shapes, there appears to be sufficient volume for the task of predicting the pedestrian count.
+# Preview of the number of columns and rows.
+print(sensor_locations_df.shape)
+print(pedestrian_counts_df.shape)
+print(microclimate_df.shape)
+(145, 12) +(1540629, 9) +(897085, 16) ++
3.2 Checking The Features¶
This step is to verify what information is actually available in each dataset, and whether there are meaningful fields for later joining, cleaning, and modelling. Knowing the columns early prevents accidentally removing important variables or keeping irrelevant variables [14].
+From the output, the column names confirm that location_id is shared between the sensor metadata and pedestrian counts datasets. Also, the pedestrian and microclimate datasets both contain time information, which is essential because the final merge is ultimately done at the hourly level. The microclimate dataset clearly offers a diverse range of explanatory variables, which is useful for modelling. And, some columns that are likely less useful for the final model need to be removed, such as descriptive notes, raw coordinate strings, and duplicate directional fields.
+# Check the column names for each dataframe.
+print("Sensor locations columns:")
+print(sensor_locations_df.columns)
+
+print("\nPedestrian counts columns:")
+print(pedestrian_counts_df.columns)
+
+print("\nMicroclimate columns:")
+print(microclimate_df.columns)
+Sensor locations columns: +Index(['location_id', 'sensor_description', 'sensor_name', 'installation_date', + 'note', 'location_type', 'status', 'direction_1', 'direction_2', + 'latitude', 'longitude', 'location'], + dtype='object') + +Pedestrian counts columns: +Index(['id', 'location_id', 'sensing_date', 'hourday', 'direction_1', + 'direction_2', 'pedestriancount', 'sensor_name', 'location'], + dtype='object') + +Microclimate columns: +Index(['device_id', 'received_at', 'sensorlocation', 'latlong', + 'minimumwinddirection', 'averagewinddirection', 'maximumwinddirection', + 'minimumwindspeed', 'averagewindspeed', 'gustwindspeed', + 'airtemperature', 'relativehumidity', 'atmosphericpressure', 'pm25', + 'pm10', 'noise'], + dtype='object') ++
3.3 Checking The Datatypes¶
Datatype checking is essential because many later operations depend on correct types to proceed. Steps like date parsing, numeric aggregation, interpolation, rolling windows, and model preparation can all fail or behave incorrectly if types are wrong [15].
+From the outputs, the pedestrian sensing_date, sensor installation_date, and microclimate received_at fields are all initially stored as objects, so they are not yet ready for time-series operations, which will need to be dealt with. And the numeric climate variables are already in float64, and pedestrian counts are in int64, which is appropriate for aggregation and modelling, so no need to change that.
+# Check the data types for each dataframe.
+print("Sensor locations data types:")
+print(sensor_locations_df.dtypes)
+
+print("\nPedestrian counts data types:")
+print(pedestrian_counts_df.dtypes)
+
+print("\nMicroclimate data types:")
+print(microclimate_df.dtypes)
+Sensor locations data types: +location_id int64 +sensor_description object +sensor_name object +installation_date object +note object +location_type object +status object +direction_1 object +direction_2 object +latitude float64 +longitude float64 +location object +dtype: object + +Pedestrian counts data types: +id int64 +location_id int64 +sensing_date object +hourday int64 +direction_1 int64 +direction_2 int64 +pedestriancount int64 +sensor_name object +location object +dtype: object + +Microclimate data types: +device_id object +received_at object +sensorlocation object +latlong object +minimumwinddirection float64 +averagewinddirection float64 +maximumwinddirection float64 +minimumwindspeed float64 +averagewindspeed float64 +gustwindspeed float64 +airtemperature float64 +relativehumidity float64 +atmosphericpressure float64 +pm25 float64 +pm10 float64 +noise float64 +dtype: object ++
3.4 Checking The Dataset Information¶
Using .info() gives a more complete structural summary of the datasets, which shows non-null counts, memory usage, and dtype balance. But this step will focus more on the memory that will be used up in the RAM, to decide what sample size would be ideal for experimentation before scaling to the full dataset size, or opt for a platform like Google Colab to handle more heavy usage [16].
+From the outputs, the pedestrian dataset takes up the most memory, followed by the microclimate dataset, with the sensor locations dataset being extremely low. Which is acceptable to run locally for modelling.
+# Check the overall structure of each dataframe.
+print("Sensor locations info:")
+print(sensor_locations_df.info())
+
+print("\nPedestrian counts info:")
+print(pedestrian_counts_df.info())
+
+print("\nMicroclimate info:")
+print(microclimate_df.info())
+Sensor locations info: +<class 'pandas.core.frame.DataFrame'> +RangeIndex: 145 entries, 0 to 144 +Data columns (total 12 columns): + # Column Non-Null Count Dtype +--- ------ -------------- ----- + 0 location_id 145 non-null int64 + 1 sensor_description 145 non-null object + 2 sensor_name 145 non-null object + 3 installation_date 145 non-null object + 4 note 34 non-null object + 5 location_type 145 non-null object + 6 status 145 non-null object + 7 direction_1 113 non-null object + 8 direction_2 113 non-null object + 9 latitude 145 non-null float64 + 10 longitude 145 non-null float64 + 11 location 145 non-null object +dtypes: float64(2), int64(1), object(9) +memory usage: 13.7+ KB +None + +Pedestrian counts info: +<class 'pandas.core.frame.DataFrame'> +RangeIndex: 1540629 entries, 0 to 1540628 +Data columns (total 9 columns): + # Column Non-Null Count Dtype +--- ------ -------------- ----- + 0 id 1540629 non-null int64 + 1 location_id 1540629 non-null int64 + 2 sensing_date 1540629 non-null object + 3 hourday 1540629 non-null int64 + 4 direction_1 1540629 non-null int64 + 5 direction_2 1540629 non-null int64 + 6 pedestriancount 1540629 non-null int64 + 7 sensor_name 1540629 non-null object + 8 location 1540629 non-null object +dtypes: int64(6), object(3) +memory usage: 105.8+ MB +None + +Microclimate info: +<class 'pandas.core.frame.DataFrame'> +RangeIndex: 897085 entries, 0 to 897084 +Data columns (total 16 columns): + # Column Non-Null Count Dtype +--- ------ -------------- ----- + 0 device_id 897085 non-null object + 1 received_at 897085 non-null object + 2 sensorlocation 897085 non-null object + 3 latlong 897085 non-null object + 4 minimumwinddirection 763016 non-null float64 + 5 averagewinddirection 881374 non-null float64 + 6 maximumwinddirection 763006 non-null float64 + 7 minimumwindspeed 763006 non-null float64 + 8 averagewindspeed 881372 non-null float64 + 9 gustwindspeed 763006 non-null float64 + 10 airtemperature 881372 non-null float64 + 11 relativehumidity 881372 non-null float64 + 12 atmosphericpressure 881372 non-null float64 + 13 pm25 823822 non-null float64 + 14 pm10 823822 non-null float64 + 15 noise 823822 non-null float64 +dtypes: float64(12), object(4) +memory usage: 109.5+ MB +None ++
3.5 Checking For Missing Values¶
Checking for missing values is important, since it affects the cleaning strategy, feature selection, and merge quality. This is especially important because missing sensor readings can break hourly continuity for the modelling later [17].
+From the outputs, the pedestrian counts dataset has no missing values at all. The microclimate dataset has many missing values in all the variables besides device_id and received_at. And the sensor locations dataset has some missing values, mainly in note, direction_1, and direction_2, which are not necessary for city-level forecasting.
+# Count the missing values in each dataframe.
+print("Missing values in sensor locations:")
+print(sensor_locations_df.isna().sum())
+
+print("\nMissing values in pedestrian counts:")
+print(pedestrian_counts_df.isna().sum())
+
+print("\nMissing values in microclimate:")
+print(microclimate_df.isna().sum())
+Missing values in sensor locations: +location_id 0 +sensor_description 0 +sensor_name 0 +installation_date 0 +note 111 +location_type 0 +status 0 +direction_1 32 +direction_2 32 +latitude 0 +longitude 0 +location 0 +dtype: int64 + +Missing values in pedestrian counts: +id 0 +location_id 0 +sensing_date 0 +hourday 0 +direction_1 0 +direction_2 0 +pedestriancount 0 +sensor_name 0 +location 0 +dtype: int64 + +Missing values in microclimate: +device_id 0 +received_at 0 +sensorlocation 0 +latlong 0 +minimumwinddirection 134069 +averagewinddirection 15711 +maximumwinddirection 134079 +minimumwindspeed 134079 +averagewindspeed 15713 +gustwindspeed 134079 +airtemperature 15713 +relativehumidity 15713 +atmosphericpressure 15713 +pm25 73263 +pm10 73263 +noise 73263 +dtype: int64 ++
3.6 Checking Summary Statistics¶
Checking the summary statistics is an easy way to provide some early insights into the datasets, like central tendency and spread [18].
+From the sensor coordinates with the mean latitude and longitude, the Melbourne CBD can be inferred to be the main area of focus for the datasets. The pedestrian counts are strongly right-skewed, with the median count being a fair bit lower than the mean, and the maximum being really high, which suggests that some hours and sites are much busier than others. The average microclimate temperature is about 16°C, and the average relative humidity is about 66%, which looks plausible for Melbourne across a long time range.
+# Check the summary statistics for numeric columns.
+print("Sensor locations summary:")
+print(sensor_locations_df.describe())
+
+print("\nPedestrian counts summary:")
+print(pedestrian_counts_df.describe())
+
+print("\nMicroclimate summary:")
+print(microclimate_df.describe())
+Sensor locations summary: + location_id latitude longitude +count 145.000000 145.000000 145.000000 +mean 93.075862 -37.812392 144.960427 +std 55.229079 0.006999 0.009594 +min 1.000000 -37.825910 144.928606 +25% 47.000000 -37.816888 144.956447 +50% 89.000000 -37.813625 144.961860 +75% 142.000000 -37.807767 144.965626 +max 188.000000 -37.789353 144.986388 + +Pedestrian counts summary: + id location_id hourday direction_1 direction_2 \ +count 1.540629e+06 1.540629e+06 1.540629e+06 1.540629e+06 1.540629e+06 +mean 4.683120e+11 7.196742e+01 1.175280e+01 1.981603e+02 2.008987e+02 +std 5.137983e+11 5.119167e+01 6.796006e+00 3.107706e+02 3.148589e+02 +min 1.020241e+09 1.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 +25% 6.292025e+10 3.000000e+01 6.000000e+00 1.800000e+01 1.900000e+01 +50% 2.111203e+11 6.100000e+01 1.200000e+01 7.900000e+01 7.800000e+01 +75% 7.116202e+11 1.170000e+02 1.800000e+01 2.360000e+02 2.410000e+02 +max 1.852320e+12 1.850000e+02 2.300000e+01 1.009900e+04 1.108500e+04 + + pedestriancount +count 1.540629e+06 +mean 3.990590e+02 +std 5.953562e+02 +min 0.000000e+00 +25% 3.900000e+01 +50% 1.620000e+02 +75% 4.930000e+02 +max 1.111300e+04 + +Microclimate summary: + minimumwinddirection averagewinddirection maximumwinddirection \ +count 763016.000000 881374.000000 763006.000000 +mean 23.908174 166.489727 295.974746 +std 61.529961 125.699058 97.279991 +min 0.000000 0.000000 0.000000 +25% 0.000000 42.000000 253.000000 +50% 0.000000 161.000000 351.000000 +75% 0.000000 299.000000 357.000000 +max 359.000000 359.000000 360.000000 + + minimumwindspeed averagewindspeed gustwindspeed airtemperature \ +count 763006.000000 881372.000000 763006.000000 881372.000000 +mean 0.231898 1.007714 3.179780 16.689150 +std 0.574129 0.942189 2.504463 5.581520 +min 0.000000 0.000000 0.000000 -0.800000 +25% 0.000000 0.400000 1.400000 12.800000 +50% 0.000000 0.800000 2.500000 16.100000 +75% 0.100000 1.400000 4.400000 19.700001 +max 11.600000 11.200000 52.500000 45.400002 + + relativehumidity atmosphericpressure pm25 pm10 \ +count 881372.000000 881372.000000 823822.000000 823822.000000 +mean 68.007261 1013.943718 6.691192 8.972910 +std 17.562959 9.815047 30.102780 30.536759 +min 2.500000 894.700000 0.000000 0.000000 +25% 56.400002 1008.500000 1.000000 3.000000 +50% 69.400000 1014.100000 3.000000 5.000000 +75% 80.800000 1019.900000 7.000000 9.000000 +max 99.800003 1042.900000 3414.000000 3414.000000 + + noise +count 823822.000000 +mean 66.442163 +std 11.194487 +min 0.000000 +25% 58.500000 +50% 68.100000 +75% 71.800000 +max 131.100000 ++
3.7 Checking The Date Format¶
Checking the date format is important because a time-based merge depends on all datasets sharing a compatible datetime structure, and public datasets often use different date formats and timezone conventions. This is important because any mismatch in date or time formatting can prevent the datasets from merging correctly later [19].
+The pedestrian dataset stores dates and hours separately. Whereas the microclimate dataset stores the full timestamps with date and time with UTC offsets. And the sensor installation date is a simple date string and is mainly historical metadata rather than a modelling field. This makes it clear that datetime standardisation is a required step before any merge can occur.
+# Check a few date values before conversion.
+print("Pedestrian sensing_date sample:")
+print(pedestrian_counts_df["sensing_date"].head())
+
+print("\nMicroclimate received_at sample:")
+print(microclimate_df["received_at"].head())
+
+print("\nSensor installation_date sample:")
+print(sensor_locations_df["installation_date"].head())
+Pedestrian sensing_date sample: +0 2025-10-08 +1 2026-05-07 +2 2024-10-18 +3 2026-01-06 +4 2024-12-03 +Name: sensing_date, dtype: object + +Microclimate received_at sample: +0 2026-04-28T15:25:15+00:00 +1 2026-04-28T04:38:50+00:00 +2 2026-04-28T05:08:52+00:00 +3 2026-04-28T10:19:41+00:00 +4 2026-04-28T13:55:03+00:00 +Name: received_at, dtype: object + +Sensor installation_date sample: +0 2009-03-25 +1 2009-03-26 +2 2009-03-23 +3 2009-01-21 +4 2009-03-24 +Name: installation_date, dtype: object ++
3.8 Checking The ID Columns¶
This was an additional step for checking the unique identifiers to see if the datasets have any chance of being joined directly by ID or whether another strategy, like a datetime merge, is best. And since the project uses multiple public datasets, checking the unique IDs also helps confirm whether the pedestrian sensors and microclimate sensors use the same location system or separate systems [20].
+From the outputs, the sensor locations dataset contains 137 unique location_id values, while the pedestrian counts dataset contains 100 unique location_id values. Whereas the microclimate dataset contains 12 unique device_id values, which is a completely different identifier system. This means the microclimate data cannot be joined to pedestrian counts by location ID, so a time-based merge is the best integration method available.
+# Check important identifier columns.
+print("Unique location_id values in sensor locations:")
+print(sensor_locations_df["location_id"].nunique())
+
+print("\nUnique location_id values in pedestrian counts:")
+print(pedestrian_counts_df["location_id"].nunique())
+
+print("\nUnique device_id values in microclimate:")
+print(microclimate_df["device_id"].nunique())
+Unique location_id values in sensor locations: +137 + +Unique location_id values in pedestrian counts: +100 + +Unique device_id values in microclimate: +12 ++
4. Data Cleaning¶
This is the section for data cleaning, since raw data are rarely ready to be used as it is, so cleaning processes are necessary to address duplicates, missing values, inconsistencies, syntax errors, irrelevant data and structural errors [21] [22]. And since this data pipeline uses the API for dataset access, this means that the datasets are being updated in real-time, so ensuring any errors get addressed in the pipeline ensures the data remains accurate, secure and accessible at every stage of its lifecycle [21]. And that the prediction will also be accurate [22].
+4.1 Removing Irrelevant Columns¶
Selecting relevant variables and removing the irrelevant ones are necessary because not every feature is useful for the prediction modelling. Keeping unnecessary columns can make the workflow harder to manage, increase memory usage, and create confusion in later steps [23] [24].
+In this step, the sensor dataset is reduced to six useful metadata columns, even though it wasn't used for modelling purposes. The pedestrian dataset is reduced to the four fields needed to construct hourly counts. The microclimate dataset is reduced to key climate, air quality, and noise variables. The descriptive or duplicate variables were omitted.
+# Keep only useful columns.
+sensor_locations_clean = sensor_locations_df[
+ [
+ "location_id",
+ "sensor_description",
+ "installation_date",
+ "status",
+ "latitude",
+ "longitude",
+ ]
+].copy()
+
+pedestrian_clean = pedestrian_counts_df[
+ [
+ "location_id",
+ "sensing_date",
+ "hourday",
+ "pedestriancount",
+ ]
+].copy()
+
+microclimate_clean = microclimate_df[
+ [
+ "device_id",
+ "received_at",
+ "airtemperature",
+ "relativehumidity",
+ "atmosphericpressure",
+ "averagewindspeed",
+ "gustwindspeed",
+ "averagewinddirection",
+ "pm25",
+ "pm10",
+ "noise",
+ ]
+].copy()
+
+# Check the result.
+print(sensor_locations_clean.head())
+print(pedestrian_clean.head())
+print(microclimate_clean.head())
+location_id sensor_description installation_date status latitude \ +0 3 Melbourne Central 2009-03-25 A -37.811015 +1 5 Princes Bridge 2009-03-26 A -37.818742 +2 9 Southern Cross Station 2009-03-23 A -37.819830 +3 12 New Quay 2009-01-21 A -37.814580 +4 14 Sandridge Bridge 2009-03-24 A -37.820112 + + longitude +0 144.964295 +1 144.967877 +2 144.951026 +3 144.942924 +4 144.962919 + location_id sensing_date hourday pedestriancount +0 70 2025-10-08 2 1 +1 3 2026-05-07 12 2042 +2 59 2024-10-18 15 1058 +3 14 2026-01-06 10 315 +4 28 2024-12-03 8 799 + device_id received_at airtemperature \ +0 ICTMicroclimate-08 2026-04-28T15:25:15+00:00 16.7 +1 ICTMicroclimate-08 2026-04-28T04:38:50+00:00 21.9 +2 ICTMicroclimate-08 2026-04-28T05:08:52+00:00 21.6 +3 ICTMicroclimate-10 2026-04-28T10:19:41+00:00 19.7 +4 ICTMicroclimate-08 2026-04-28T13:55:03+00:00 17.3 + + relativehumidity atmosphericpressure averagewindspeed gustwindspeed \ +0 79.9 1026.1 0.5 2.5 +1 61.5 1021.5 0.9 2.6 +2 61.1 1021.3 0.2 1.9 +3 57.4 1018.4 0.8 1.3 +4 76.2 1025.5 0.3 2.5 + + averagewinddirection pm25 pm10 noise +0 220.0 10.0 12.0 58.7 +1 283.0 27.0 32.0 75.6 +2 310.0 26.0 29.0 69.2 +3 324.0 39.0 49.0 61.9 +4 281.0 10.0 12.0 79.4 ++
4.2 Removing Missing Value¶
This step involves dealing with missing values by removing the rows they're in. This is because missing values in explanatory variables can cause problems when doing aggregation, interpolation, and modelling later, leading to bias results [25]. This is especially important for the microclimate dataset, since missing climate readings could affect the quality of the explanatory variables.
+After doing the column selection in the previous step, the sensor and pedestrian tables are now fully complete without missing values. Whereas the microclimate table still has many missing values, especially in gustwindspeed, pm25, pm10, and noise.
+By using dropna(), the microclimate dataset lost a number of data points with missing values, but still retains a sizable portion of data points. The decision for completeness in the data points was preferred to simplify the merging and feature engineering steps later, and because there were enough data points for it not to matter much.
+# Check missing values after column selection.
+print("Missing values in sensor_locations_clean:")
+print(sensor_locations_clean.isna().sum())
+
+print("\nMissing values in pedestrian_clean:")
+print(pedestrian_clean.isna().sum())
+
+print("\nMissing values in microclimate_clean:")
+print(microclimate_clean.isna().sum())
+Missing values in sensor_locations_clean: +location_id 0 +sensor_description 0 +installation_date 0 +status 0 +latitude 0 +longitude 0 +dtype: int64 + +Missing values in pedestrian_clean: +location_id 0 +sensing_date 0 +hourday 0 +pedestriancount 0 +dtype: int64 + +Missing values in microclimate_clean: +device_id 0 +received_at 0 +airtemperature 15713 +relativehumidity 15713 +atmosphericpressure 15713 +averagewindspeed 15713 +gustwindspeed 134079 +averagewinddirection 15711 +pm25 73263 +pm10 73263 +noise 73263 +dtype: int64 ++
# Remove rows with any missing values from each dataset.
+sensor_locations_clean = sensor_locations_clean.dropna().copy()
+pedestrian_clean = pedestrian_clean.dropna().copy()
+microclimate_clean = microclimate_clean.dropna().copy()
+
+# Check missing values after removal.
+print("\nMissing values in sensor_locations_clean:")
+print(sensor_locations_clean.isna().sum())
+print(sensor_locations_clean.shape)
+
+print("\nMissing values in pedestrian_clean:")
+print(pedestrian_clean.isna().sum())
+print(pedestrian_clean.shape)
+
+print("\nMissing values in microclimate_clean:")
+print(microclimate_clean.isna().sum())
+print(microclimate_clean.shape)
++Missing values in sensor_locations_clean: +location_id 0 +sensor_description 0 +installation_date 0 +status 0 +latitude 0 +longitude 0 +dtype: int64 +(145, 6) + +Missing values in pedestrian_clean: +location_id 0 +sensing_date 0 +hourday 0 +pedestriancount 0 +dtype: int64 +(1540629, 4) + +Missing values in microclimate_clean: +device_id 0 +received_at 0 +airtemperature 0 +relativehumidity 0 +atmosphericpressure 0 +averagewindspeed 0 +gustwindspeed 0 +averagewinddirection 0 +pm25 0 +pm10 0 +noise 0 +dtype: int64 +(705456, 11) ++
4.3 Datetime Formatting¶
Ensuring the datetime formatting matches between the different datasets is important because the modelling is hourly, and the datasets need to be able to merge based on a common point for perform chronological analysis across different data sources [26] [27]. Skipping this step would mean that the datasets cannot be merged into one table.
+The pedestrian dataset is converted from separate sensing_date and hourday fields into a single datetime_hour, such as 2024-12-06 20:00:00. Whereas the microclimate timestamps are converted from UTC into Australia/Melbourne, timezone information is removed, and the values are floored to the nearest hour. By doing this, both datasets now have a datetime variable with the same time formatting. It's also important to note that the time range of the microclimate dataset is narrower than the pedestrian time range, meaning that the overlap period is limited to the microclimate dataset.
+# Convert sensor installation date.
+sensor_locations_clean["installation_date"] = pd.to_datetime(
+ sensor_locations_clean["installation_date"]
+)
+
+# Create an hourly datetime for pedestrian data.
+pedestrian_clean["sensing_date"] = pd.to_datetime(
+ pedestrian_clean["sensing_date"]
+)
+
+pedestrian_clean["datetime_hour"] = (
+ pedestrian_clean["sensing_date"] +
+ pd.to_timedelta(pedestrian_clean["hourday"], unit="h")
+)
+
+# Check the result.
+print(pedestrian_clean.head())
+location_id sensing_date hourday pedestriancount datetime_hour +0 70 2025-10-08 2 1 2025-10-08 02:00:00 +1 3 2026-05-07 12 2042 2026-05-07 12:00:00 +2 59 2024-10-18 15 1058 2024-10-18 15:00:00 +3 14 2026-01-06 10 315 2026-01-06 10:00:00 +4 28 2024-12-03 8 799 2024-12-03 08:00:00 ++
# Convert the raw timestamp to datetime with UTC.
+microclimate_clean["received_at"] = pd.to_datetime(
+ microclimate_clean["received_at"],
+ utc=True
+)
+
+# Convert UTC to Melbourne local time.
+microclimate_clean["received_at"] = (
+ microclimate_clean["received_at"]
+ .dt.tz_convert("Australia/Melbourne")
+)
+
+# Remove the timezone after conversion.
+microclimate_clean["received_at"] = (
+ microclimate_clean["received_at"]
+ .dt.tz_localize(None)
+)
+
+# Round down to the nearest hour.
+microclimate_clean["datetime_hour"] = (
+ microclimate_clean["received_at"]
+ .dt.floor("h")
+)
+
+# Check the result.
+print(microclimate_clean.head())
+device_id received_at airtemperature relativehumidity \ +0 ICTMicroclimate-08 2026-04-29 01:25:15 16.7 79.9 +1 ICTMicroclimate-08 2026-04-28 14:38:50 21.9 61.5 +2 ICTMicroclimate-08 2026-04-28 15:08:52 21.6 61.1 +3 ICTMicroclimate-10 2026-04-28 20:19:41 19.7 57.4 +4 ICTMicroclimate-08 2026-04-28 23:55:03 17.3 76.2 + + atmosphericpressure averagewindspeed gustwindspeed averagewinddirection \ +0 1026.1 0.5 2.5 220.0 +1 1021.5 0.9 2.6 283.0 +2 1021.3 0.2 1.9 310.0 +3 1018.4 0.8 1.3 324.0 +4 1025.5 0.3 2.5 281.0 + + pm25 pm10 noise datetime_hour +0 10.0 12.0 58.7 2026-04-29 01:00:00 +1 27.0 32.0 75.6 2026-04-28 14:00:00 +2 26.0 29.0 69.2 2026-04-28 15:00:00 +3 39.0 49.0 61.9 2026-04-28 20:00:00 +4 10.0 12.0 79.4 2026-04-28 23:00:00 ++
# Confirming the same datetime format.
+print("Pedestrian time range:")
+print(pedestrian_clean["datetime_hour"].min())
+print(pedestrian_clean["datetime_hour"].max())
+
+print("\nMicroclimate time range:")
+print(microclimate_clean["datetime_hour"].min())
+print(microclimate_clean["datetime_hour"].max())
+Pedestrian time range: +2024-05-12 00:00:00 +2026-05-11 03:00:00 + +Microclimate time range: +2022-11-08 12:00:00 +2026-05-07 15:00:00 ++
4.4 Aggregating Values For Hourly Format¶
This step involves aggregation because the use case models city-level pedestrian demand rather than individual sensor-level behaviour. Since the pedestrian and microclimate datasets both contain multiple records within the same hour. This also ensures that both the pedestrian and the microclimate data share the same hourly rows without duplicates for later merging, reducing the total data volume [28].
+The aggregation involves the pedestrian counts being summed across sensors to produce hourly city totals, and the microclimate readings are averaged across devices for each hour. Doing this changes the target variable from pedestrians at one site to overall city pedestrian demand at one hour, along with the climate values at that hour. This also further reduces the row counts due to aggregating to hourly, and the microclimate data is still narrower than the pedestrian dataset.
+# Aggregate pedestrian counts to hourly city totals.
+pedestrian_hourly = (
+ pedestrian_clean
+ .groupby("datetime_hour", as_index=False)
+ .agg(
+ {
+ "pedestriancount": "sum",
+ }
+ )
+)
+
+# Check the result.
+print(pedestrian_hourly.head())
+print(pedestrian_hourly.shape)
+datetime_hour pedestriancount +0 2024-05-12 00:00:00 15093 +1 2024-05-12 01:00:00 10686 +2 2024-05-12 02:00:00 6751 +3 2024-05-12 03:00:00 5431 +4 2024-05-12 04:00:00 2728 +(17351, 2) ++
# Aggregate microclimate readings to hourly city averages.
+microclimate_hourly = (
+ microclimate_clean
+ .groupby("datetime_hour", as_index=False)
+ .agg(
+ {
+ "airtemperature": "mean",
+ "relativehumidity": "mean",
+ "atmosphericpressure": "mean",
+ "averagewindspeed": "mean",
+ "gustwindspeed": "mean",
+ "averagewinddirection": "mean",
+ "pm25": "mean",
+ "pm10": "mean",
+ "noise": "mean",
+ }
+ )
+)
+
+# Check the result.
+print(microclimate_hourly.head())
+print(microclimate_hourly.shape)
+datetime_hour airtemperature relativehumidity atmosphericpressure \ +0 2022-11-08 12:00:00 28.400 38.800 1011.100 +1 2022-11-08 13:00:00 29.400 34.800 1009.800 +2 2022-11-08 14:00:00 27.320 40.940 1009.420 +3 2022-11-23 15:00:00 15.100 87.150 1010.800 +4 2022-11-23 16:00:00 16.475 80.725 1010.475 + + averagewindspeed gustwindspeed averagewinddirection pm25 pm10 noise +0 3.100 4.100 74.00 3.00 5.00 62.80 +1 2.100 3.500 92.00 5.00 8.00 86.90 +2 1.700 2.780 253.40 3.00 5.20 87.78 +3 0.450 1.050 291.00 2.00 5.50 95.10 +4 1.675 2.475 271.25 4.25 6.75 100.30 +(27720, 10) ++
4.5 Merging The Datasets¶
This step will merge the datasets together, ensuring the target variable, pedestrian count, is connected to the explanatory climate variables. The datetime_hour variable on both the pedestrian dataset and the microclimate dataset was inner-joined to merge into one dataset, meaning only the overlapping rows with the same values were merged [29]. Which means every row has both pedestrian and climate information, hence, a unified format ready for analysis [30].
+A quick check of the merged dataset shows that the pedestrian counts and climate values aligned in the same hourly observations, which is what the use case is looking for. And there are no missing values, which indicates that previous data cleaning works as intended.
+# Merge pedestrian and microclimate data on the hourly datetime.
+model_df = pd.merge(
+ pedestrian_hourly,
+ microclimate_hourly,
+ on="datetime_hour",
+ how="inner"
+)
+
+# Sort the final modelling table.
+model_df = model_df.sort_values(
+ "datetime_hour"
+).reset_index(drop=True)
+
+# Check the result.
+print(model_df.head())
+print(model_df.shape)
+datetime_hour pedestriancount airtemperature relativehumidity \ +0 2024-05-12 00:00:00 15093 12.771429 84.646429 +1 2024-05-12 01:00:00 10686 12.046429 88.128571 +2 2024-05-12 02:00:00 6751 11.457143 90.596429 +3 2024-05-12 03:00:00 5431 11.121429 91.982143 +4 2024-05-12 04:00:00 2728 10.837037 92.677778 + + atmosphericpressure averagewindspeed gustwindspeed averagewinddirection \ +0 1022.835714 0.353571 1.282143 166.142857 +1 1022.635714 0.507143 1.435714 140.321429 +2 1022.521429 0.585714 1.578571 153.857143 +3 1022.278571 0.607143 1.507143 139.250000 +4 1022.251852 0.566667 1.562963 170.074074 + + pm25 pm10 noise +0 8.464286 9.428571 66.071429 +1 11.107143 12.392857 65.500000 +2 9.892857 11.357143 64.278571 +3 7.678571 9.035714 61.942857 +4 8.074074 9.740741 61.566667 +(17267, 11) ++
# Check merged dataset.
+print(model_df.info())
+print("\nMissing values:")
+print(model_df.isna().sum())
+print("\nSummary statistics:")
+print(model_df.describe())
+<class 'pandas.core.frame.DataFrame'> +RangeIndex: 17267 entries, 0 to 17266 +Data columns (total 11 columns): + # Column Non-Null Count Dtype +--- ------ -------------- ----- + 0 datetime_hour 17267 non-null datetime64[ns] + 1 pedestriancount 17267 non-null int64 + 2 airtemperature 17267 non-null float64 + 3 relativehumidity 17267 non-null float64 + 4 atmosphericpressure 17267 non-null float64 + 5 averagewindspeed 17267 non-null float64 + 6 gustwindspeed 17267 non-null float64 + 7 averagewinddirection 17267 non-null float64 + 8 pm25 17267 non-null float64 + 9 pm10 17267 non-null float64 + 10 noise 17267 non-null float64 +dtypes: datetime64[ns](1), float64(9), int64(1) +memory usage: 1.4 MB +None + +Missing values: +datetime_hour 0 +pedestriancount 0 +airtemperature 0 +relativehumidity 0 +atmosphericpressure 0 +averagewindspeed 0 +gustwindspeed 0 +averagewinddirection 0 +pm25 0 +pm10 0 +noise 0 +dtype: int64 + +Summary statistics: + datetime_hour pedestriancount airtemperature \ +count 17267 17267.00000 17267.000000 +mean 2025-05-09 05:49:24.104940288 35432.95975 16.511347 +min 2024-05-12 00:00:00 51.00000 2.617857 +25% 2024-11-07 21:30:00 7142.00000 12.527951 +50% 2025-05-06 18:00:00 35519.00000 15.910256 +75% 2025-11-08 09:30:00 58964.50000 19.561111 +max 2026-05-07 15:00:00 114804.00000 43.191667 +std NaN 27363.02579 5.547545 + + relativehumidity atmosphericpressure averagewindspeed gustwindspeed \ +count 17267.000000 17267.000000 17267.000000 17267.000000 +mean 66.584728 1013.925313 1.162485 3.496650 +min 9.875000 990.350000 0.121875 0.693939 +25% 56.075397 1008.605840 0.721429 2.289380 +50% 68.540741 1013.885714 1.065625 3.263889 +75% 78.714583 1019.287650 1.508001 4.527639 +max 96.673171 1039.162500 4.028125 10.200000 +std 16.060614 7.914093 0.578561 1.523908 + + averagewinddirection pm25 pm10 noise +count 17267.000000 17267.000000 17267.000000 17267.000000 +mean 184.944173 6.935045 8.731420 68.252640 +min 52.571429 0.333333 1.057143 53.745714 +25% 160.194444 1.892857 3.313393 65.136111 +50% 184.742857 3.368421 5.000000 68.113043 +75% 211.442222 7.577381 9.306624 71.034330 +max 306.827586 411.228571 417.028571 88.200000 +std 37.950660 12.914232 13.465303 4.521855 ++
5. Data Validation¶
Doing data validation is important because a merged dataset that was cleaned may still be unsuitable for the task of this use case, possibly due to timestamps being duplicated, out of order, or some rows in the chronological datetime are missing. The time-series model tends to assume a consistent temporal structure with no sudden breaks, so this step in the pipeline checks that everything is complete [31].
+5.1 Validating Time Series Dataset¶
Checking for duplicates or missing timestamps, since they can affect the lag features, rolling features and any sequence-based deep learning models that this use case may use. This is to ensure that there is a strictly ordered sequence of evenly spaced time points [32].
+From the outputs, there are no duplicates in the datetime_hour values, which means every timestamp is represented once. The dataset is double-checked to ensure that it is sorted in increasing time order. But there appears to be a number of missing hourly timestamps, which means the dataset is not complete yet. It does appear that random points were cut off, and that no large block of time was cut off.
+# Count duplicate datetime values.
+duplicate_count = model_df["datetime_hour"].duplicated().sum()
+
+# Print the result.
+print("Duplicate datetime_hour values:", duplicate_count)
+Duplicate datetime_hour values: 0 ++
# Create the full expected hourly range.
+full_hours = pd.date_range(
+ start=model_df["datetime_hour"].min(),
+ end=model_df["datetime_hour"].max(),
+ freq="h"
+)
+
+# Find missing hours.
+missing_hours = full_hours.difference(model_df["datetime_hour"])
+
+# Print the number of missing hours.
+print("Number of missing hourly timestamps:", len(missing_hours))
+
+# Preview a few missing hours.
+print(missing_hours[:10])
+Number of missing hourly timestamps: 149 +DatetimeIndex(['2024-10-06 02:00:00', '2025-06-06 22:00:00', + '2025-06-06 23:00:00', '2025-06-07 00:00:00', + '2025-06-07 01:00:00', '2025-06-07 02:00:00', + '2025-06-07 03:00:00', '2025-06-07 04:00:00', + '2025-06-07 05:00:00', '2025-06-07 06:00:00'], + dtype='datetime64[ns]', freq=None) ++
# Confirm the dataset is sorted by time.
+print(model_df["datetime_hour"].is_monotonic_increasing)
+True ++
5.2 Fixing Missing Timestamps¶
Ensuring the missing timestamps are filled in is necessary for a complete hourly sequence in the dataset, and missing them can create issues when performing feature engineering later. Missing hours can cause problems when creating lag features, rolling averages, and LSTM input sequences, since these methods rely on consistent time gaps between rows [32] [33].
+This step involves reindexing to the full hourly range so that all the missing timestamps are included in the dataset, and then interpolation is performed to fill in the missing values from those created rows. Hence, the merged dataset now has slightly more rows with no missing hourly stamps, ensuring the timeline is continuous with no breaks. A little addition was included to ensure that the pedestrian counts remained as integers, rather than fractions, due to interpolation.
+# Set datetime as the index.
+model_df = model_df.set_index("datetime_hour").sort_index()
+
+# Create the full hourly timeline.
+full_hours = pd.date_range(
+ start=model_df.index.min(),
+ end=model_df.index.max(),
+ freq="h"
+)
+
+# Reindex to restore missing hours.
+model_df = model_df.reindex(full_hours)
+
+# Interpolate all numeric columns over time.
+model_df = model_df.interpolate(method="time").ffill().bfill()
+
+# Convert pedestrian count back to integers.
+model_df["pedestriancount"] = (
+ model_df["pedestriancount"]
+ .round()
+ .astype(int)
+)
+
+# Restore datetime_hour as a column.
+model_df = model_df.reset_index().rename(
+ columns={"index": "datetime_hour"}
+)
+
+# Check the result.
+print("Missing hourly timestamps:",
+ pd.date_range(
+ model_df["datetime_hour"].min(),
+ model_df["datetime_hour"].max(),
+ freq="h"
+ ).difference(model_df["datetime_hour"]).shape[0])
+
+# Check for missing hourly timestamps again.
+full_hours = pd.date_range(
+ start=model_df["datetime_hour"].min(),
+ end=model_df["datetime_hour"].max(),
+ freq="h"
+)
+
+missing_hours = full_hours.difference(model_df["datetime_hour"])
+
+# Check the dataset structure.
+print("\nOverall:")
+print(model_df.info())
+
+# Check the final time range.
+print("\nFinal time range:")
+print(model_df["datetime_hour"].min())
+print(model_df["datetime_hour"].max())
+
+# Check the dataframe shape.
+print("\nFinal shape:")
+print(model_df.shape)
+Missing hourly timestamps: 0 + +Overall: +<class 'pandas.core.frame.DataFrame'> +RangeIndex: 17416 entries, 0 to 17415 +Data columns (total 11 columns): + # Column Non-Null Count Dtype +--- ------ -------------- ----- + 0 datetime_hour 17416 non-null datetime64[ns] + 1 pedestriancount 17416 non-null int64 + 2 airtemperature 17416 non-null float64 + 3 relativehumidity 17416 non-null float64 + 4 atmosphericpressure 17416 non-null float64 + 5 averagewindspeed 17416 non-null float64 + 6 gustwindspeed 17416 non-null float64 + 7 averagewinddirection 17416 non-null float64 + 8 pm25 17416 non-null float64 + 9 pm10 17416 non-null float64 + 10 noise 17416 non-null float64 +dtypes: datetime64[ns](1), float64(9), int64(1) +memory usage: 1.5 MB +None + +Final time range: +2024-05-12 00:00:00 +2026-05-07 15:00:00 + +Final shape: +(17416, 11) ++
6. Exploratory Data Analysis¶
The exploratory data analysis section was performed on the merged dataset to observe any interpretable patterns and get a rough understanding of how the dataset looks and feels before building a model. It's important to get a sense of how pedestrian demand changes over time and how it relates to the climate conditions [34].
+6.1 Pedestrian Count Over Time (Hourly)¶
Plotting the hourly pedestrian count is for a quick look at the target variable changes over time, to check if the time series data is random or if there are any patterns that may need to be taken into consideration for later steps [35] [36].
+From the plot, there appear to be large fluctuations across the full time range with repeated highs and lows. Although it looks random, it does seem to show some sort of pattern, and it's not necessarily random noise, with recurring fluctuations, like the new year of each year seems to always be a high peak, indicating lots of pedestrians travelling in the city, and there also appear to be some sharp dips, which may possibly be public holidays. This does suggest that there may be other possible variables that might have influenced the pedestrian count, but it definitely confirms that pedestrian demand is influenced by recurring temporal and possibly environmental effects.
+# Plot pedestrian counts over time.
+plt.figure(figsize=(14, 5))
+plt.plot(
+ model_df["datetime_hour"],
+ model_df["pedestriancount"],
+ label="Hourly pedestrian count"
+)
+
+plt.title("Pedestrian Count Over Time (Hourly)")
+plt.xlabel("Datetime")
+plt.ylabel("Pedestrian Count")
+plt.legend()
+plt.tight_layout()
+plt.show()
+6.2 Average Pedestrian Count By Day Of Week¶
The day-of-week summary was checked to see if there may also be other variables like work patterns, shopping activities, or weekend behaviours that might have influenced the pedestrian demand. Also checks whether the day of the week should be considered an important time-based feature for later modelling [35] [37].
+From the plot, it does seem like the weekdays tend to be relatively high along with Saturdays, whereas Mondays and Sundays tend to have low pedestrian demands. Monday might be low despite being a normal workday for the average Australians might be due to the influence of public holidays, and Sunday being low is due to most people not working on Sunday, since businesses would usually pay a high penalty rate. This further suggests that the CBD pedestrian pattern is linked to weekday economic and commuter activity, not just climate variables.
+# Create a day-of-week column.
+model_df["day_of_week"] = model_df[
+ "datetime_hour"
+].dt.dayofweek
+
+# Group by day of week.
+dow_pattern = model_df.groupby("day_of_week")[
+ "pedestriancount"
+].mean()
+
+# Plot the day-of-week pattern.
+plt.figure(figsize=(8, 4))
+plt.plot(
+ dow_pattern.index,
+ dow_pattern.values,
+ marker="o",
+ label="Average pedestrian count"
+)
+
+plt.title("Average Pedestrian Count By Day Of Week")
+plt.xlabel("Day Of Week")
+plt.ylabel("Average Pedestrian Count")
+plt.xticks(
+ ticks=range(7),
+ labels=[
+ "Mon", "Tue", "Wed",
+ "Thu", "Fri", "Sat", "Sun"
+ ]
+)
+plt.legend()
+plt.tight_layout()
+plt.show()
+6.3 Distributions Of Variables¶
Checking for distribution to see whether it's symmetric, skewed, or multi-modal. This is useful because skewed variables, extreme values, or unusual distributions can influence how the model learns from the data [38].
+-
+
- pedestriancount appears unimodal and right-skewed. +
- airtemperature appears unimodal with a slight right skew. +
- relativehumidity appears unimodal with a slight left skew. +
- atmosphericpressure appears unimodal and slightly left-skewed. +
- averagewindspeed appears unimodal and right-skewed. +
- gustwindspeed appears unimodal and right-skewed. +
- averagewinddirection appears roughly unimodal and mostly symmetric. +
- pm25 appears unimodal and right-skewed. +
- pm10 appears unimodal and right-skewed, similar to pm25. +
- noise appears unimodal and fairly symmetric. +
# Store numeric columns.
+numeric_cols = model_df.select_dtypes(
+ include=["int32", "int64", "float64"]
+).columns.drop(
+ ["day_of_week"],
+ errors="ignore"
+)
+
+# Create subplot layout.
+n_cols = 3
+n_rows = int(np.ceil(len(numeric_cols) / n_cols))
+
+fig, axes = plt.subplots(
+ n_rows,
+ n_cols,
+ figsize=(16, 12)
+)
+
+axes = axes.flatten()
+
+# Plot histogram for each numeric column.
+for i, col in enumerate(numeric_cols):
+ axes[i].hist(
+ model_df[col],
+ bins=30,
+ label=col
+ )
+ axes[i].set_title(f"Distribution Of {col}")
+ axes[i].set_xlabel(col)
+ axes[i].set_ylabel("Frequency")
+ axes[i].legend()
+
+# Remove empty subplots.
+for j in range(i + 1, len(axes)):
+ fig.delaxes(axes[j])
+
+plt.tight_layout()
+plt.show()
+6.4 Correlation Matrix¶
Checking the correlation matrix is a quick way to see which climate variables might have the strongest relationship with pedestrian demand, which does seem like they have some influence on the target variable from the results. This is mainly to show whether the relationship is positive or negative, even though correlation does not prove causation [39].
+-
+
- noise has the strongest positive correlation with pedestrian count, followed by gustwindspeed, airtemperature, averagewindspeed, and averagewinddirection. This suggests that pedestrian activity tends to increase when these variables increase. +
- relativehumidity has the strongest negative correlation with pedestrian count, while pm10, pm25, and atmosphericpressure have weaker negative relationships. This suggests that pedestrian activity tends to decrease when these variables increase. +
This suggests that climate variables do play a role in influencing the pedestrian demand, but there are other influences as well as discovered from previous plots. But the goal for this task is to understand how climate variables affect pedestrian demands.
+# Calculate the correlation matrix.
+corr_matrix = model_df[numeric_cols].corr()
+
+# Plot the correlation matrix.
+plt.figure(figsize=(10, 8))
+im = plt.imshow(
+ corr_matrix,
+ cmap="coolwarm",
+ aspect="auto"
+)
+
+cbar = plt.colorbar(im)
+cbar.set_label("Correlation Coefficient")
+
+plt.xticks(
+ ticks=np.arange(len(corr_matrix.columns)),
+ labels=corr_matrix.columns,
+ rotation=45,
+ ha="right"
+)
+
+plt.yticks(
+ ticks=np.arange(len(corr_matrix.columns)),
+ labels=corr_matrix.columns
+)
+
+plt.title("Correlation Matrix")
+
+# Add values inside each cell.
+for i in range(len(corr_matrix.index)):
+ for j in range(len(corr_matrix.columns)):
+ plt.text(
+ j,
+ i,
+ f"{corr_matrix.iloc[i, j]:.2f}",
+ ha="center",
+ va="center",
+ fontsize=8
+ )
+
+plt.tight_layout()
+plt.show()
+
+# Print the correlation values with pedestrian count.
+print(corr_matrix["pedestriancount"].sort_values(ascending=False))
+pedestriancount 1.000000 +noise 0.555729 +gustwindspeed 0.466541 +airtemperature 0.390398 +averagewindspeed 0.364756 +averagewinddirection 0.119948 +atmosphericpressure -0.034576 +pm25 -0.035831 +pm10 -0.039647 +relativehumidity -0.465781 +Name: pedestriancount, dtype: float64 ++
7.1 Pedestrian Count Over Time (24-Hour Rolling Mean)¶
The previous plot with the pedestrian count over time was noisy, so smoothing over 24 hours may help reveal more patterns of pedestrian activity without the hour-to-hour volatility [40].
+From the plot, it became much more obvious that the highest peaks were during New Year's, where pedestrian visit the Melbourne CBD to see the fireworks, and confirms what has been previously discussed. There does seem to be a slightly increasing trend, which suggests that the influence of COVID-19 is still recovering.
+# Create the hourly target series.
+ts = model_df.set_index("datetime_hour")[
+ "pedestriancount"
+].sort_index()
+
+# Check the first few rows.
+print(ts.head())
+datetime_hour +2024-05-12 00:00:00 15093 +2024-05-12 01:00:00 10686 +2024-05-12 02:00:00 6751 +2024-05-12 03:00:00 5431 +2024-05-12 04:00:00 2728 +Name: pedestriancount, dtype: int64 ++
# Calculate the 24-hour rolling mean.
+rolling_mean_24 = ts.rolling(24).mean()
+
+# Plot the rolling mean only.
+plt.figure(figsize=(14, 5))
+plt.plot(
+ rolling_mean_24,
+ label="24-hour rolling mean"
+)
+
+plt.title("24-Hour Rolling Mean Of Pedestrian Count")
+plt.xlabel("Datetime")
+plt.ylabel("Rolling Mean")
+plt.legend()
+plt.tight_layout()
+plt.show()
+7.2 Seasonal Pattern (24-Hour Cycle)¶
Seasonal decomposition is necessary because hourly pedestrian demand is expected to have a strong daily cycle, so checking the seasonal component can help check how the typical hour of the day affects pedestrian activity. Since people usually move through the city at different levels during the morning, workday, evening, and late night, this step helps show whether the hour of the day is a factor in pedestrian demand [41].
+From the plot, it is clear that the seasonal effect is strongly negative at night and becomes strongly positive during the workday, especially the peak at 5 PM, when most people finish work and are looking to go home. The lowest tends to be around 3 AM, which is expected, since people tend to party late until 12 PM before heading home, and the other reason is that the city's pedestrian activities are very limited at that time. This plot further confirms that hour-of-day is a major driver of pedestrian demand, and there may be other variables influencing pedestrian demand.
+# Decompose the series using a 24-hour period.
+decomp_24 = seasonal_decompose(
+ ts,
+ model="additive",
+ period=24
+)
+
+# Store one seasonal cycle and align it to the actual hours.
+seasonal_24 = pd.Series(
+ decomp_24.seasonal[:24].values,
+ index=ts.index[:24].hour
+)
+
+# Sort by actual hour.
+seasonal_24 = seasonal_24.sort_index()
+
+# Plot the corrected daily seasonal pattern.
+plt.figure(figsize=(10, 4))
+plt.plot(
+ seasonal_24.index,
+ seasonal_24.values,
+ marker="o",
+ label="Daily seasonal effect"
+)
+
+plt.title("Daily Seasonal Pattern (24-Hour Cycle)")
+plt.xlabel("Hour")
+plt.ylabel("Seasonal Effect")
+plt.xticks(range(24))
+plt.legend()
+plt.tight_layout()
+plt.show()
+7.3 Autocorrelation Of Pedestrian Count¶
Checking for autocorrelation is important because it shows whether the current pedestrian demand depends on previous hours, and if strong dependence exists, then lag features will be very useful for forecasting [42].
+From the plot, there appears to be a very strong repeating pattern at a regular interval, especially around daily cycles. This repeated structure applies not only on the daily level, but also appears to be on the weekly level as well, meaning daily and weekly lag features will be useful for predictions.
+# Plot autocorrelation for one week of hourly lags.
+fig, ax = plt.subplots(figsize=(12, 5))
+plot_acf(ts, lags=168, ax=ax)
+
+ax.set_title("Autocorrelation Of Pedestrian Count")
+ax.set_xlabel("Lag (Hours)")
+ax.set_ylabel("Autocorrelation")
+
+# Add legend handles.
+legend_handles = [
+ Line2D([0], [0], color="C0", lw=2, label="Autocorrelation"),
+ Patch(alpha=0.2, label="95% confidence interval")
+]
+
+ax.legend(handles=legend_handles, loc="upper right")
+plt.tight_layout()
+plt.show()
+7.4 Augmented Dickey-Fuller Test¶
The Augmented Dickey-Fuller Test checks whether the time series is statistically stationary in the unit-root sense, and while not necessary for deep learning modelling, it does provide some more understanding of the patterns underlying the merged dataset [43].
+The result was that the p-value is extremely small and the test statistic is far below the critical values, hence, the null hypothesis of a unit root is rejected. This means that the time series dataset is statistically stationary enough to exhibit a learnable structure rather than behaving like a random walk, which basically means that it's not necessarily a random coin toss in layman's terms.
+# Run the Augmented Dickey-Fuller test.
+adf_result = adfuller(ts)
+
+# Print the results.
+print("ADF Statistic:", adf_result[0])
+print("p-value:", adf_result[1])
+print("Critical Values:")
+
+for key, value in adf_result[4].items():
+ print(f"{key}: {value}")
+ADF Statistic: -12.334016562004392 +p-value: 6.332696219939632e-23 +Critical Values: +1%: -3.4307265048981876 +5%: -2.8617064005452915 +10%: -2.5668585707044094 ++
8. Feature Engineering¶
The feature engineering section is necessary because the current variables in the merged datasets may not necessarily be enough for a strong forecasting model, so doing feature engineering will create more useful variables that help capture other aspects of the datasets, like cyclical structures, recent history, and short-term trends [44] [45].
+8.1 Creating Time Features¶
Creating more calendar-based features is necessary because pedestrian activity depends on when that observation happened, as shown in previous plots. Hence, hours, day of week, month, and weekend status are all useful predictors. This step ensures the datetime_hour column is converted into its individual components [45].
+The output shows that new time-based columns were added to the dataset. These include hour, day_of_week, month, and is_weekend.
+# Create a copy for feature engineering.
+features_df = model_df.copy()
+
+# Extract calendar features.
+features_df["hour"] = features_df["datetime_hour"].dt.hour
+features_df["day_of_week"] = (
+ features_df["datetime_hour"].dt.dayofweek
+)
+features_df["month"] = features_df["datetime_hour"].dt.month
+features_df["is_weekend"] = (
+ features_df["day_of_week"] >= 5
+).astype(int)
+
+# Check the result.
+print(
+ features_df[
+ [
+ "datetime_hour", "hour", "day_of_week", "month",
+ "is_weekend"
+ ]
+ ].head()
+)
+datetime_hour hour day_of_week month is_weekend +0 2024-05-12 00:00:00 0 6 5 1 +1 2024-05-12 01:00:00 1 6 5 1 +2 2024-05-12 02:00:00 2 6 5 1 +3 2024-05-12 03:00:00 3 6 5 1 +4 2024-05-12 04:00:00 4 6 5 1 ++
8.2 Creating Cyclical Time Features¶
Cyclical encoding is necessary since time variables are cyclical and not linear, like a clock. If we're talking just normal values like 0 and 23, these two values are quite far apart, but it's not, since it's time and there's only 1 hour difference. Using sine and cosine preserves that circular structure. This step ensures that the time variables are cyclical and prevents the models from learning misleading distances between values at the edge of a cycle [46].
+The output shows that new cyclical time features were created, including hour_sin, hour_cos, dow_sin, dow_cos, month_sin, and month_cos.
+# Hour and day of week are cyclical, not linear.
+# Create cyclical hour features.
+features_df["hour_sin"] = np.sin(
+ 2 * np.pi * features_df["hour"] / 24
+)
+features_df["hour_cos"] = np.cos(
+ 2 * np.pi * features_df["hour"] / 24
+)
+
+# Create cyclical day-of-week features.
+features_df["dow_sin"] = np.sin(
+ 2 * np.pi * features_df["day_of_week"] / 7
+)
+features_df["dow_cos"] = np.cos(
+ 2 * np.pi * features_df["day_of_week"] / 7
+)
+
+# Convert cyclical month features.
+features_df["month_sin"] = np.sin(
+ 2 * np.pi * features_df["month"] / 12
+)
+features_df["month_cos"] = np.cos(
+ 2 * np.pi * features_df["month"] / 12
+)
+
+# Check the result.
+print(
+ features_df[
+ [
+ "datetime_hour", "hour_sin", "hour_cos",
+ "dow_sin", "dow_cos", "month_sin",
+ "month_cos"
+ ]
+ ].head()
+)
+datetime_hour hour_sin hour_cos dow_sin dow_cos month_sin \ +0 2024-05-12 00:00:00 0.000000 1.000000 -0.781831 0.62349 0.5 +1 2024-05-12 01:00:00 0.258819 0.965926 -0.781831 0.62349 0.5 +2 2024-05-12 02:00:00 0.500000 0.866025 -0.781831 0.62349 0.5 +3 2024-05-12 03:00:00 0.707107 0.707107 -0.781831 0.62349 0.5 +4 2024-05-12 04:00:00 0.866025 0.500000 -0.781831 0.62349 0.5 + + month_cos +0 -0.866025 +1 -0.866025 +2 -0.866025 +3 -0.866025 +4 -0.866025 ++
8.3 Creating Cyclical Wind Direction Features¶
Wind direction is also circular, being 360 degrees, meaning 0 and 359 degrees are almost the same direction. So wind direction must also be converted to cyclical for a continuous form, and ensure consistency with treatments of cyclical variables like time [46].
+The output shows that two new features were created, wind_dir_sin and wind_dir_cos.
+# Convert wind direction into cyclical form.
+features_df["wind_dir_sin"] = np.sin(
+ 2 * np.pi * features_df["averagewinddirection"] / 360
+)
+features_df["wind_dir_cos"] = np.cos(
+ 2 * np.pi * features_df["averagewinddirection"] / 360
+)
+
+# Check the result.
+print(
+ features_df[
+ [
+ "averagewinddirection",
+ "wind_dir_sin",
+ "wind_dir_cos"
+ ]
+ ].head()
+)
+averagewinddirection wind_dir_sin wind_dir_cos +0 166.142857 0.239502 -0.970896 +1 140.321429 0.638480 -0.769638 +2 153.857143 0.440611 -0.897698 +3 139.250000 0.652760 -0.757565 +4 170.074074 0.172375 -0.985031 ++
8.4 Creating Lag Features¶
Lag features are important, as indicated by the autocorrelation plot, since pedestrian demand is highly dependent on recent history based on the autocorrelation plot [47] [48].
+The features lag_1, lag_24, and lag_168 capture the previous hour, previous day, and previous week at the same hour. This does create missing values since there wasn't recent information to populate the lag columns for specific rows, which will be dealt with. Like the very first row having a missing value for lag_1, because there wasn't a previous row to populate that value.
+# Create lagged pedestrian count features.
+features_df["lag_1"] = (
+ features_df["pedestriancount"].shift(1)
+)
+
+features_df["lag_24"] = (
+ features_df["pedestriancount"].shift(24)
+)
+
+features_df["lag_168"] = (
+ features_df["pedestriancount"].shift(168)
+)
+
+# Check the result.
+print(
+ features_df[
+ [
+ "lag_1", "lag_24", "lag_168"
+ ]
+ ].head(25)
+)
+lag_1 lag_24 lag_168 +0 NaN NaN NaN +1 15093.0 NaN NaN +2 10686.0 NaN NaN +3 6751.0 NaN NaN +4 5431.0 NaN NaN +5 2728.0 NaN NaN +6 2306.0 NaN NaN +7 4251.0 NaN NaN +8 8872.0 NaN NaN +9 18370.0 NaN NaN +10 26934.0 NaN NaN +11 40599.0 NaN NaN +12 53701.0 NaN NaN +13 62802.0 NaN NaN +14 63419.0 NaN NaN +15 65710.0 NaN NaN +16 74282.0 NaN NaN +17 65628.0 NaN NaN +18 49356.0 NaN NaN +19 39189.0 NaN NaN +20 31933.0 NaN NaN +21 24297.0 NaN NaN +22 19736.0 NaN NaN +23 13472.0 NaN NaN +24 8268.0 15093.0 NaN ++
8.5 Creating Rolling Features¶
Rolling features summarise recent pedestrian counts rather than relying on a single observation for a datapoint, allowing the model to capture the short-term trend and help smooth the short-term noise [47].
+Some new predictors were added, like rolling_mean_24, which summarises the previous 24 hours, while rolling_mean_168 summarises the previous week. And as expected, similar to lag features but not the same, the first rows are missing values since the rolling window cannot be calculated until enough history exists.
+# Create rolling mean features from past counts.
+features_df["rolling_mean_24"] = (
+ features_df["pedestriancount"]
+ .shift(1)
+ .rolling(24)
+ .mean()
+)
+
+features_df["rolling_mean_168"] = (
+ features_df["pedestriancount"]
+ .shift(1)
+ .rolling(168)
+ .mean()
+)
+
+# Check the result.
+print(
+ features_df[
+ [
+ "rolling_mean_24", "rolling_mean_168"
+ ]
+ ].head(25)
+)
+rolling_mean_24 rolling_mean_168 +0 NaN NaN +1 NaN NaN +2 NaN NaN +3 NaN NaN +4 NaN NaN +5 NaN NaN +6 NaN NaN +7 NaN NaN +8 NaN NaN +9 NaN NaN +10 NaN NaN +11 NaN NaN +12 NaN NaN +13 NaN NaN +14 NaN NaN +15 NaN NaN +16 NaN NaN +17 NaN NaN +18 NaN NaN +19 NaN NaN +20 NaN NaN +21 NaN NaN +22 NaN NaN +23 NaN NaN +24 29742.25 NaN ++
8.6 Removing NA Rows¶
This step removes the rows with missing values created from the lag and rolling features at the beginning of the ordered merged dataset, since they cannot be used as they are incomplete predictor information. By removing these rows, the dataset lost a part of the early period as a trade-off in the pipeline, which is reasonable considering the dataset is being updated in real-time, so there will be more data points to use in the future, hence, negligible [48].
+The output shows that all rows with missing values were removed, and the dataset was reset into a clean index.
+# Remove rows with NAs.
+features_df = features_df.dropna().reset_index(drop=True)
+
+# Check the result.
+print(features_df.info())
+print(features_df.head())
+<class 'pandas.core.frame.DataFrame'> +RangeIndex: 17248 entries, 0 to 17247 +Data columns (total 28 columns): + # Column Non-Null Count Dtype +--- ------ -------------- ----- + 0 datetime_hour 17248 non-null datetime64[ns] + 1 pedestriancount 17248 non-null int64 + 2 airtemperature 17248 non-null float64 + 3 relativehumidity 17248 non-null float64 + 4 atmosphericpressure 17248 non-null float64 + 5 averagewindspeed 17248 non-null float64 + 6 gustwindspeed 17248 non-null float64 + 7 averagewinddirection 17248 non-null float64 + 8 pm25 17248 non-null float64 + 9 pm10 17248 non-null float64 + 10 noise 17248 non-null float64 + 11 day_of_week 17248 non-null int32 + 12 hour 17248 non-null int32 + 13 month 17248 non-null int32 + 14 is_weekend 17248 non-null int64 + 15 hour_sin 17248 non-null float64 + 16 hour_cos 17248 non-null float64 + 17 dow_sin 17248 non-null float64 + 18 dow_cos 17248 non-null float64 + 19 month_sin 17248 non-null float64 + 20 month_cos 17248 non-null float64 + 21 wind_dir_sin 17248 non-null float64 + 22 wind_dir_cos 17248 non-null float64 + 23 lag_1 17248 non-null float64 + 24 lag_24 17248 non-null float64 + 25 lag_168 17248 non-null float64 + 26 rolling_mean_24 17248 non-null float64 + 27 rolling_mean_168 17248 non-null float64 +dtypes: datetime64[ns](1), float64(22), int32(3), int64(2) +memory usage: 3.5 MB +None + datetime_hour pedestriancount airtemperature relativehumidity \ +0 2024-05-19 00:00:00 12751 8.655172 69.200000 +1 2024-05-19 01:00:00 8603 8.664286 68.792857 +2 2024-05-19 02:00:00 5660 8.960714 69.235714 +3 2024-05-19 03:00:00 4709 9.660714 67.675000 +4 2024-05-19 04:00:00 2682 10.050000 69.060714 + + atmosphericpressure averagewindspeed gustwindspeed averagewinddirection \ +0 1026.241379 0.841379 2.303448 204.137931 +1 1025.732143 1.042857 2.535714 203.071429 +2 1024.671429 1.757143 3.717857 237.107143 +3 1024.171429 1.460714 3.657143 247.857143 +4 1023.675000 1.482143 3.807143 240.821429 + + pm25 pm10 ... dow_cos month_sin month_cos wind_dir_sin \ +0 6.758621 9.448276 ... 0.62349 0.5 -0.866025 -0.408935 +1 6.750000 9.428571 ... 0.62349 0.5 -0.866025 -0.391878 +2 6.892857 9.535714 ... 0.62349 0.5 -0.866025 -0.839688 +3 6.107143 8.678571 ... 0.62349 0.5 -0.866025 -0.926247 +4 5.678571 8.428571 ... 0.62349 0.5 -0.866025 -0.873104 + + wind_dir_cos lag_1 lag_24 lag_168 rolling_mean_24 rolling_mean_168 +0 -0.912564 21105.0 8948.0 15093.0 33068.333333 31092.851190 +1 -0.920017 12751.0 6244.0 10686.0 33226.791667 31078.910714 +2 -0.543070 8603.0 4035.0 6751.0 33325.083333 31066.511905 +3 -0.376917 5660.0 2813.0 5431.0 33392.791667 31060.017857 +4 -0.487533 4709.0 1761.0 2728.0 33471.791667 31055.720238 + +[5 rows x 28 columns] ++
8.7 Removing Unnecessary Features¶
Since the cyclical variables were created, the original raw cyclical variables have become redundant. So removing these variables helps reduce feature duplication and makes the modelling easier and cleaner [23] [24].
+The averagewinddirection, hour, day_of_week, and month columns are dropped. Resulting in 24 columns, including the target pedestrian count, selected climate variables, is_weekend, cyclical encodings, lag features, and rolling means. The datetime_hour is temporarily kept, but will be removed later on as well.
+# Drop raw columns that now have cyclical replacements.
+features_df = features_df.drop(
+ columns=[
+ "averagewinddirection",
+ "hour",
+ "day_of_week",
+ "month",
+ ]
+)
+
+# Reorder the columns into a cleaner structure.
+features_df = features_df[
+ [
+ "datetime_hour",
+ "pedestriancount",
+ "airtemperature",
+ "relativehumidity",
+ "atmosphericpressure",
+ "averagewindspeed",
+ "gustwindspeed",
+ "pm25",
+ "pm10",
+ "noise",
+ "is_weekend",
+ "hour_sin",
+ "hour_cos",
+ "dow_sin",
+ "dow_cos",
+ "month_sin",
+ "month_cos",
+ "wind_dir_sin",
+ "wind_dir_cos",
+ "lag_1",
+ "lag_24",
+ "lag_168",
+ "rolling_mean_24",
+ "rolling_mean_168",
+ ]
+]
+
+# Check the result.
+print(features_df.info())
+<class 'pandas.core.frame.DataFrame'> +RangeIndex: 17248 entries, 0 to 17247 +Data columns (total 24 columns): + # Column Non-Null Count Dtype +--- ------ -------------- ----- + 0 datetime_hour 17248 non-null datetime64[ns] + 1 pedestriancount 17248 non-null int64 + 2 airtemperature 17248 non-null float64 + 3 relativehumidity 17248 non-null float64 + 4 atmosphericpressure 17248 non-null float64 + 5 averagewindspeed 17248 non-null float64 + 6 gustwindspeed 17248 non-null float64 + 7 pm25 17248 non-null float64 + 8 pm10 17248 non-null float64 + 9 noise 17248 non-null float64 + 10 is_weekend 17248 non-null int64 + 11 hour_sin 17248 non-null float64 + 12 hour_cos 17248 non-null float64 + 13 dow_sin 17248 non-null float64 + 14 dow_cos 17248 non-null float64 + 15 month_sin 17248 non-null float64 + 16 month_cos 17248 non-null float64 + 17 wind_dir_sin 17248 non-null float64 + 18 wind_dir_cos 17248 non-null float64 + 19 lag_1 17248 non-null float64 + 20 lag_24 17248 non-null float64 + 21 lag_168 17248 non-null float64 + 22 rolling_mean_24 17248 non-null float64 + 23 rolling_mean_168 17248 non-null float64 +dtypes: datetime64[ns](1), float64(21), int64(2) +memory usage: 3.2 MB +None ++
9. Preparing Train/Val/Test Splits¶
Preparing the splits is necessary for training a forecasting model, so that the future periods are evaluated, and not on randomly selected time periods. Hence, splitting the dataset based on time is important so that the model being trained on the past can predict the future, like in real-world scenarios [49].
+9.1 Splitting The Data By Time¶
Chronological splitting in time-series forecasting is to prevent leaking future information into the training process. For example is if random splitting were to be used, then a model can be trained on data points in 2026, but is tested on a time period in 2025. Hence, the validation and test set must be later, and the training set must be the past, as shown here [49].
+The split for training, validation and testing is 80:10:10 to ensure that there are enough data points used for training, with enough data points to perform validation and testing. Deep learning tends to require a large number of data points, so using this split ratio prevents underfitting [50].
+The output confirms that the data was split chronologically, with the training data coming first, followed by validation and testing data.
+# Set the split sizes (80:10:10).
+train_size = int(len(features_df) * 0.8)
+val_size = int(len(features_df) * 0.1)
+
+# Split the data in chronological order.
+train_df = features_df.iloc[:train_size].copy()
+val_df = features_df.iloc[
+ train_size:train_size + val_size
+].copy()
+test_df = features_df.iloc[
+ train_size + val_size:
+].copy()
+
+# Check the shapes.
+print(train_df.shape)
+print(val_df.shape)
+print(test_df.shape)
+(13798, 24) +(1724, 24) +(1726, 24) ++
# Checking The Split Ranges.
+print("Train range:")
+print(train_df["datetime_hour"].min())
+print(train_df["datetime_hour"].max())
+
+print("\nValidation range:")
+print(val_df["datetime_hour"].min())
+print(val_df["datetime_hour"].max())
+
+print("\nTest range:")
+print(test_df["datetime_hour"].min())
+print(test_df["datetime_hour"].max())
+Train range: +2024-05-19 00:00:00 +2025-12-14 21:00:00 + +Validation range: +2025-12-14 22:00:00 +2026-02-24 17:00:00 + +Test range: +2026-02-24 18:00:00 +2026-05-07 15:00:00 ++
9.2 Separating Features And Target¶
This step is necessary to separate the predictor inputs X and the output target y. This is necessary because the model needs to know which columns are used as inputs and which column it needs to predict [51].
+The target is set to pedestriancount, which is the variable the model is trying to predict. The datetime column is excluded from the features since the chronological splitting is completed, so this leaves 22 predictor columns for each split, as shown in the shapes output.
+# Store the target column name.
+target_col = "pedestriancount"
+
+# Store the feature column names.
+feature_cols = [
+ col for col in features_df.columns
+ if col not in ["datetime_hour", target_col]
+]
+
+# Create X and y for each split.
+X_train = train_df[feature_cols]
+y_train = train_df[target_col]
+
+X_val = val_df[feature_cols]
+y_val = val_df[target_col]
+
+X_test = test_df[feature_cols]
+y_test = test_df[target_col]
+
+# Check the shapes.
+print(X_train.shape, y_train.shape)
+print(X_val.shape, y_val.shape)
+print(X_test.shape, y_test.shape)
+(13798, 22) (13798,) +(1724, 22) (1724,) +(1726, 22) (1726,) ++
9.3 Scaling The Features¶
Scaling the features is necessary because the predictor variables are measured on different scales. For example, temperature, humidity, air pressure, wind speed, pollution values, and lagged pedestrian counts all have different ranges. Hence, needs to be standardised so that the variables are unitless, allowing the variables to be able to directly compared [52].
+Standardisation basically sets the predictors to have a mean of 0 and a standard deviation of 1. This allows faster convergence when all the input features are on the same scale, preventing feature dominance due to differences in magnitudes, and is generally more stable [52].
+# Create the scaler.
+scaler = StandardScaler()
+
+# Fit on training features only.
+X_train_scaled = scaler.fit_transform(X_train)
+
+# Transform validation and test features.
+X_val_scaled = scaler.transform(X_val)
+X_test_scaled = scaler.transform(X_test)
+
+# Check the shapes.
+print(X_train_scaled.shape)
+print(X_val_scaled.shape)
+print(X_test_scaled.shape)
+print(X_train_scaled[:5])
+(13798, 22) +(1724, 22) +(1726, 22) +[[-1.29777704e+00 1.50360125e-01 1.46578468e+00 -5.80148521e-01 + -7.99559383e-01 1.00283430e-01 1.98749444e-01 -3.63941261e-01 + 1.57673844e+00 -7.77706241e-05 1.41447137e+00 -1.10371969e+00 + 8.80498928e-01 1.03546540e+00 -1.13513198e+00 -5.75432725e-01 + -4.91738264e-01 -5.08174682e-01 -9.59909049e-01 -7.29249335e-01 + -3.26314765e-01 -1.04167887e+00] + [-1.29610869e+00 1.24879825e-01 1.40393909e+00 -2.36045144e-01 + -6.48552677e-01 9.91130647e-02 1.96368567e-01 -5.59070044e-01 + 1.57673844e+00 3.65929517e-01 1.36628083e+00 -1.10371969e+00 + 8.80498928e-01 1.03546540e+00 -1.13513198e+00 -5.44617322e-01 + -5.21312792e-01 -8.18712499e-01 -1.06044050e+00 -8.93515834e-01 + -2.95950862e-01 -1.04569266e+00] + [-1.24184204e+00 1.52595239e-01 1.27511778e+00 9.83881253e-01 + 1.20012207e-01 1.18507689e-01 2.09314586e-01 2.43030983e-01 + 1.57673844e+00 7.06994013e-01 1.22499330e+00 -1.10371969e+00 + 8.80498928e-01 1.03546540e+00 -1.13513198e+00 -1.35366847e+00 + 9.74390635e-01 -9.72903410e-01 -1.14256846e+00 -1.04018901e+00 + -2.77116141e-01 -1.04926256e+00] + [-1.11369428e+00 5.49207567e-02 1.21439393e+00 4.77611798e-01 + 8.05390863e-02 1.18372546e-02 1.05746435e-01 -2.70089534e-01 + 1.57673844e+00 9.99872735e-01 1.00023730e+00 -1.10371969e+00 + 8.80498928e-01 1.03546540e+00 -1.13513198e+00 -1.51005421e+00 + 1.63367379e+00 -1.08230164e+00 -1.18800094e+00 -1.08939068e+00 + -2.64141820e-01 -1.05113235e+00] + [-1.04242843e+00 1.41643180e-01 1.15410382e+00 5.14209590e-01 + 1.78060915e-01 -4.63466188e-02 7.55390572e-02 -4.47800540e-01 + 1.57673844e+00 1.22460648e+00 7.07329572e-01 -1.10371969e+00 + 8.80498928e-01 1.03546540e+00 -1.13513198e+00 -1.41404236e+00 + 1.19475651e+00 -1.11765254e+00 -1.22711303e+00 -1.19014229e+00 + -2.49003782e-01 -1.05236973e+00]] ++
10. Deep Learning — Pedestrian Count Forecasting¶
This section covers the deep learning component of the project. The goal is to build +and compare recurrent neural network models that can forecast hourly pedestrian counts +in Melbourne CBD using climate features and engineered time-series inputs.
+Deep learning models, specifically recurrent neural network architectures like LSTM +and GRU, are well suited for this task because pedestrian demand is a sequential, +time-dependent process [53]. Unlike traditional regression +models, recurrent networks can learn temporal patterns across multiple time steps, +making them appropriate for hourly forecasting tasks with lag dependencies +[54].
+10.2 Setting Random Seeds for Reproducibility¶
A fixed random seed is set across all relevant libraries before any model is built or +trained. Deep learning models initialise their weights randomly and apply random +dropout masks during training, which means that results can differ between runs +without a fixed seed [55]. By setting the same seed across +Python, NumPy, and TensorFlow, the outputs of this section are fully reproducible +and will produce identical results on every run +[7] [55].
+Additional environment variables TF_DETERMINISTIC_OPS and +TF_DISABLE_SEGMENT_REDUCTION_OP_DETERMINISM_EXCEPTIONS are set to further enforce +deterministic behaviour at the TensorFlow operation level +[55]. The student ID was used as the seed value for consistency +with the random seed convention used earlier in this notebook +[7].
+SEED = 224120439
+
+# 1. Force single thread to eliminate CPU non-determinism
+os.environ["PYTHONHASHSEED"] = str(SEED)
+os.environ["TF_DETERMINISTIC_OPS"] = "1"
+os.environ["TF_DISABLE_SEGMENT_REDUCTION_OP_DETERMINISM_EXCEPTIONS"] = "1"
+
+# 2. Seed all libraries
+random.seed(SEED)
+np.random.seed(SEED)
+tf.keras.utils.set_random_seed(SEED)
+
+# 3. Force deterministic operations
+try:
+ tf.config.experimental.enable_op_determinism()
+except Exception:
+ pass # silently skip if not supported on this system
+
+print(f"Random seed set to: {SEED}")
+print(f"TF version: {tf.__version__}")
+Random seed set to: 224120439 +TF version: 2.20.0 ++
10.3 Creating Input Sequences¶
Before training a recurrent neural network, the data must be reshaped into +three-dimensional sequences. Each sequence consists of a fixed-length window of past +hourly observations that the model uses to predict the next hour's pedestrian count +[56].
+A sequence length of 24 was chosen to capture one full day cycle of hourly patterns. +This is a natural choice given that pedestrian demand follows a strong 24-hour rhythm, +as confirmed by the seasonal decomposition and autocorrelation analysis in earlier +sections [35] [42]. Additionally, since lag +features covering 24 hours, 7 days, and rolling means are already included as input +features, extending the sequence window further would introduce redundancy without +meaningful benefit [47] [48].
+Each input sample X has the shape (24, 22), representing 24 consecutive hours across +22 engineered features. The corresponding target y is the pedestrian count at the +following hour, making this a one-step-ahead forecasting setup +[56]. The output shapes confirm the split sizes — 13,774 +training samples, 1,700 validation samples, and 1,702 test samples.
+sequence_length = 24
+
+def create_sequences(X, y, seq_length):
+ X_seq, y_seq = [], []
+ for i in range(seq_length, len(X)):
+ X_seq.append(X[i - seq_length:i])
+ y_seq.append(y.iloc[i])
+ return np.array(X_seq), np.array(y_seq)
+
+X_train_seq, y_train_seq = create_sequences(X_train_scaled, y_train, sequence_length)
+X_val_seq, y_val_seq = create_sequences(X_val_scaled, y_val, sequence_length)
+X_test_seq, y_test_seq = create_sequences(X_test_scaled, y_test, sequence_length)
+
+print("Train shape:", X_train_seq.shape)
+print("Val shape:", X_val_seq.shape)
+print("Test shape:", X_test_seq.shape)
+Train shape: (13774, 24, 22) +Val shape: (1700, 24, 22) +Test shape: (1702, 24, 22) ++
10.4 Model Architectures¶
Three recurrent neural network architectures were built and compared to identify the +most effective model for this forecasting task. Comparing multiple architectures +allows for evidence-based model selection rather than relying on a single approach +[53] [57].
+All three models share the same output structure — a Dense(32) hidden layer followed +by a Dense(1) output layer for regression — and are compiled with the Adam optimiser +and Mean Squared Error loss, which is standard for continuous value regression +forecasting tasks [58]. Mean Absolute Error is tracked as an +interpretable secondary metric during training [59].
+The three architectures evaluated are:
+-
+
- Baseline LSTM: A single LSTM layer with 64 units serving as the reference model +to establish a performance baseline. +
- Stacked LSTM: Two LSTM layers in sequence for deeper temporal pattern learning. +
- GRU: A more parameter-efficient alternative to LSTM with a simplified gate +structure. +
10.4.1 Baseline LSTM¶
The baseline model uses a single LSTM layer with 64 units. LSTM networks are designed +to address the vanishing gradient problem found in standard recurrent networks, +allowing them to retain information across longer time sequences +[54]. At each time step, the LSTM uses three gates — an input +gate, a forget gate, and an output gate — to decide what information to keep, discard, +or pass forward [54].
+A Dropout layer with a rate of 0.2 is applied after the LSTM to regularise the model +by randomly deactivating 20% of neurons during each training step, reducing the risk +of overfitting [60]. This model has 24,385 total trainable +parameters, which is appropriate for the training set size of approximately 13,774 +samples.
+def build_lstm_baseline(input_shape):
+ model = Sequential([
+ LSTM(64, return_sequences=False, input_shape=input_shape),
+ Dropout(0.2),
+ Dense(32, activation="relu"),
+ Dense(1)
+ ])
+ model.compile(optimizer="adam", loss="mse", metrics=["mae"])
+ return model
+
+model_baseline = build_lstm_baseline(
+ (X_train_seq.shape[1], X_train_seq.shape[2])
+)
+model_baseline.summary()
+/usr/local/lib/python3.12/dist-packages/keras/src/layers/rnn/rnn.py:199: UserWarning: Do not pass an `input_shape`/`input_dim` argument to a layer. When using Sequential models, prefer using an `Input(shape)` object as the first layer in the model instead. + super().__init__(**kwargs) ++
Model: "sequential_3"
+
+┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓ +┃ Layer (type) ┃ Output Shape ┃ Param # ┃ +┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩ +│ lstm_3 (LSTM) │ (None, 64) │ 22,272 │ +├─────────────────────────────────┼────────────────────────┼───────────────┤ +│ dropout_4 (Dropout) │ (None, 64) │ 0 │ +├─────────────────────────────────┼────────────────────────┼───────────────┤ +│ dense_6 (Dense) │ (None, 32) │ 2,080 │ +├─────────────────────────────────┼────────────────────────┼───────────────┤ +│ dense_7 (Dense) │ (None, 1) │ 33 │ +└─────────────────────────────────┴────────────────────────┴───────────────┘ ++
Total params: 24,385 (95.25 KB) ++
Trainable params: 24,385 (95.25 KB) ++
Non-trainable params: 0 (0.00 B) ++
10.4.2 Stacked LSTM¶
The stacked model adds a second LSTM layer on top of the first. The first LSTM layer +uses return_sequences=True, meaning it outputs the full sequence of hidden states at +every time step rather than just the final one. This full sequence is then passed as +input to the second LSTM layer, which compresses it into a single final hidden state +[54].
+This design allows the model to learn more abstract temporal representations across +multiple levels of processing — the first layer captures low-level hourly patterns, +while the second layer can learn higher-order dependencies across those patterns +[53]. Dropout is applied after each LSTM layer to maintain +regularisation [60]. This model has 35,777 total trainable +parameters, making it the most complex of the three architectures compared.
+def build_lstm_stacked(input_shape):
+ model = Sequential([
+ LSTM(64, return_sequences=True, input_shape=input_shape), # returns full sequence
+ Dropout(0.2),
+ LSTM(32, return_sequences=False), # compresses to final state
+ Dropout(0.2),
+ Dense(32, activation="relu"),
+ Dense(1)
+ ])
+ model.compile(optimizer="adam", loss="mse", metrics=["mae"])
+ return model
+
+model_stacked = build_lstm_stacked(
+ (X_train_seq.shape[1], X_train_seq.shape[2])
+)
+model_stacked.summary()
+Model: "sequential_4"
+
+┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓ +┃ Layer (type) ┃ Output Shape ┃ Param # ┃ +┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩ +│ lstm_4 (LSTM) │ (None, 24, 64) │ 22,272 │ +├─────────────────────────────────┼────────────────────────┼───────────────┤ +│ dropout_5 (Dropout) │ (None, 24, 64) │ 0 │ +├─────────────────────────────────┼────────────────────────┼───────────────┤ +│ lstm_5 (LSTM) │ (None, 32) │ 12,416 │ +├─────────────────────────────────┼────────────────────────┼───────────────┤ +│ dropout_6 (Dropout) │ (None, 32) │ 0 │ +├─────────────────────────────────┼────────────────────────┼───────────────┤ +│ dense_8 (Dense) │ (None, 32) │ 1,056 │ +├─────────────────────────────────┼────────────────────────┼───────────────┤ +│ dense_9 (Dense) │ (None, 1) │ 33 │ +└─────────────────────────────────┴────────────────────────┴───────────────┘ ++
Total params: 35,777 (139.75 KB) ++
Trainable params: 35,777 (139.75 KB) ++
Non-trainable params: 0 (0.00 B) ++
10.4.3 GRU (Gated Recurrent Unit)¶
The GRU model replaces the LSTM unit with a GRU layer of equal size. GRU simplifies +the LSTM architecture by combining the input and forget gates into a single update +gate, and adding a reset gate, resulting in only two gates overall instead of three +[61]. This reduction in gate complexity leads to fewer +parameters — 19,009 total — making the GRU the most lightweight of the three +architectures.
+GRU is included as an alternative because empirical research has shown it often +performs comparably or better than LSTM on time-series forecasting tasks, +particularly on datasets that are not large enough to fully justify the additional +capacity of LSTM [61] [62]. This +comparison tests whether that finding holds for this urban pedestrian forecasting +task.
+def build_gru(input_shape):
+ model = Sequential([
+ GRU(64, return_sequences=False, input_shape=input_shape),
+ Dropout(0.2),
+ Dense(32, activation="relu"),
+ Dense(1)
+ ])
+ model.compile(optimizer="adam", loss="mse", metrics=["mae"])
+ return model
+
+model_gru = build_gru(
+ (X_train_seq.shape[1], X_train_seq.shape[2])
+)
+model_gru.summary()
+Model: "sequential_5"
+
+┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓ +┃ Layer (type) ┃ Output Shape ┃ Param # ┃ +┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩ +│ gru_1 (GRU) │ (None, 64) │ 16,896 │ +├─────────────────────────────────┼────────────────────────┼───────────────┤ +│ dropout_7 (Dropout) │ (None, 64) │ 0 │ +├─────────────────────────────────┼────────────────────────┼───────────────┤ +│ dense_10 (Dense) │ (None, 32) │ 2,080 │ +├─────────────────────────────────┼────────────────────────┼───────────────┤ +│ dense_11 (Dense) │ (None, 1) │ 33 │ +└─────────────────────────────────┴────────────────────────┴───────────────┘ ++
Total params: 19,009 (74.25 KB) ++
Trainable params: 19,009 (74.25 KB) ++
Non-trainable params: 0 (0.00 B) ++
10.5 Model Training¶
All three models are trained using identical configurations to ensure a fair +comparison between architectures. The training setup includes the following design +choices [58]:
+-
+
- Epochs: A maximum of 50 epochs was set to allow sufficient learning time, with +callbacks responsible for halting training early when appropriate +[58]. +
- Batch size: A batch size of 64 was selected as a balance between training +stability and computational speed. Larger batch sizes provide more stable gradient +estimates per update compared to smaller ones [58]. +
- EarlyStopping: Monitors validation loss with a patience of 10 epochs, halting +training if no improvement is seen for 10 consecutive epochs. The +restore_best_weights=True parameter ensures the final model retains the weights +from its best validation epoch rather than the last epoch +[63]. +
- ReduceLROnPlateau: Halves the learning rate when validation loss has not +improved for 5 consecutive epochs, with a minimum learning rate floor of 1e-6. +This allows finer weight adjustments as the model approaches convergence rather +than overshooting the optimal solution [63]. +
The validation set is used exclusively for monitoring during training and is never +used to update model weights, preserving the integrity of the chronological train, +validation, and test split [49].
+def get_callbacks():
+ early_stop = EarlyStopping(
+ monitor="val_loss",
+ patience=10, # increased from 5 — prevents premature stopping
+ restore_best_weights=True
+ )
+ reduce_lr = ReduceLROnPlateau(
+ monitor="val_loss",
+ factor=0.5, # halve LR when stuck
+ patience=5,
+ min_lr=1e-6,
+ verbose=1
+ )
+ return [early_stop, reduce_lr]
+
+# Train all three models
+history_baseline = model_baseline.fit(
+ X_train_seq, y_train_seq,
+ validation_data=(X_val_seq, y_val_seq),
+ epochs=50, batch_size=64,
+ callbacks=get_callbacks(), verbose=1
+)
+
+history_stacked = model_stacked.fit(
+ X_train_seq, y_train_seq,
+ validation_data=(X_val_seq, y_val_seq),
+ epochs=50, batch_size=64,
+ callbacks=get_callbacks(), verbose=1
+)
+
+history_gru = model_gru.fit(
+ X_train_seq, y_train_seq,
+ validation_data=(X_val_seq, y_val_seq),
+ epochs=50, batch_size=64,
+ callbacks=get_callbacks(), verbose=1
+)
+Epoch 1/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 4s 11ms/step - loss: 1931178880.0000 - mae: 34742.4805 - val_loss: 2065434112.0000 - val_mae: 36316.7852 - learning_rate: 0.0010 +Epoch 2/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 1s 6ms/step - loss: 1915012608.0000 - mae: 34508.1875 - val_loss: 2041869568.0000 - val_mae: 35990.8672 - learning_rate: 0.0010 +Epoch 3/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 1s 6ms/step - loss: 1886355328.0000 - mae: 34091.4727 - val_loss: 2005516672.0000 - val_mae: 35482.5078 - learning_rate: 0.0010 +Epoch 4/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 1s 6ms/step - loss: 1846237696.0000 - mae: 33533.6406 - val_loss: 1957890944.0000 - val_mae: 34864.8906 - learning_rate: 0.0010 +Epoch 5/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 1s 7ms/step - loss: 1796227072.0000 - mae: 32903.4414 - val_loss: 1900755712.0000 - val_mae: 34188.7891 - learning_rate: 0.0010 +Epoch 6/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 1s 6ms/step - loss: 1738398336.0000 - mae: 32237.4355 - val_loss: 1835951104.0000 - val_mae: 33501.5625 - learning_rate: 0.0010 +Epoch 7/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 1s 6ms/step - loss: 1673944192.0000 - mae: 31575.8242 - val_loss: 1765187584.0000 - val_mae: 32804.0078 - learning_rate: 0.0010 +Epoch 8/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 1s 6ms/step - loss: 1604557824.0000 - mae: 30903.6895 - val_loss: 1690087552.0000 - val_mae: 32100.9355 - learning_rate: 0.0010 +Epoch 9/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 2s 10ms/step - loss: 1532111872.0000 - mae: 30227.2246 - val_loss: 1612279936.0000 - val_mae: 31392.7227 - learning_rate: 0.0010 +Epoch 10/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 2s 7ms/step - loss: 1457709312.0000 - mae: 29556.4414 - val_loss: 1533161088.0000 - val_mae: 30695.7051 - learning_rate: 0.0010 +Epoch 11/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 1s 6ms/step - loss: 1383581440.0000 - mae: 28908.4434 - val_loss: 1454138880.0000 - val_mae: 30023.1250 - learning_rate: 0.0010 +Epoch 12/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 1s 6ms/step - loss: 1310495488.0000 - mae: 28299.8477 - val_loss: 1376468352.0000 - val_mae: 29394.1934 - learning_rate: 0.0010 +Epoch 13/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 1s 6ms/step - loss: 1239926912.0000 - mae: 27729.9043 - val_loss: 1301134208.0000 - val_mae: 28809.5215 - learning_rate: 0.0010 +Epoch 14/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 1s 6ms/step - loss: 1170712448.0000 - mae: 27170.0469 - val_loss: 1229012224.0000 - val_mae: 28242.0703 - learning_rate: 0.0010 +Epoch 15/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 1s 6ms/step - loss: 1105601920.0000 - mae: 26642.3184 - val_loss: 1160922624.0000 - val_mae: 27700.8438 - learning_rate: 0.0010 +Epoch 16/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 1s 6ms/step - loss: 1043984640.0000 - mae: 26124.0840 - val_loss: 1094580352.0000 - val_mae: 27079.0586 - learning_rate: 0.0010 +Epoch 17/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 2s 9ms/step - loss: 976227584.0000 - mae: 25163.9297 - val_loss: 996540288.0000 - val_mae: 24844.2109 - learning_rate: 0.0010 +Epoch 18/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 2s 7ms/step - loss: 860002496.0000 - mae: 21725.8691 - val_loss: 885134976.0000 - val_mae: 21819.1719 - learning_rate: 0.0010 +Epoch 19/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 1s 6ms/step - loss: 772316480.0000 - mae: 19665.3848 - val_loss: 801167040.0000 - val_mae: 20392.8848 - learning_rate: 0.0010 +Epoch 20/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 1s 6ms/step - loss: 696815808.0000 - mae: 18389.9785 - val_loss: 723598528.0000 - val_mae: 19151.4785 - learning_rate: 0.0010 +Epoch 21/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 3s 6ms/step - loss: 628451968.0000 - mae: 17213.0176 - val_loss: 651424896.0000 - val_mae: 17956.3184 - learning_rate: 0.0010 +Epoch 22/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 1s 7ms/step - loss: 563721536.0000 - mae: 16077.4170 - val_loss: 584422592.0000 - val_mae: 16778.9355 - learning_rate: 0.0010 +Epoch 23/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 1s 6ms/step - loss: 505489536.0000 - mae: 15020.1113 - val_loss: 523019264.0000 - val_mae: 15676.2910 - learning_rate: 0.0010 +Epoch 24/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 2s 9ms/step - loss: 449513696.0000 - mae: 13965.4336 - val_loss: 466503328.0000 - val_mae: 14642.3008 - learning_rate: 0.0010 +Epoch 25/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 2s 8ms/step - loss: 400619104.0000 - mae: 13041.7090 - val_loss: 414948032.0000 - val_mae: 13628.7861 - learning_rate: 0.0010 +Epoch 26/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 1s 6ms/step - loss: 356454240.0000 - mae: 12158.3799 - val_loss: 368652768.0000 - val_mae: 12694.5703 - learning_rate: 0.0010 +Epoch 27/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 1s 6ms/step - loss: 317270240.0000 - mae: 11362.6035 - val_loss: 327098016.0000 - val_mae: 11882.4199 - learning_rate: 0.0010 +Epoch 28/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 1s 6ms/step - loss: 282110752.0000 - mae: 10618.4219 - val_loss: 290472800.0000 - val_mae: 11140.4375 - learning_rate: 0.0010 +Epoch 29/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 1s 6ms/step - loss: 249665072.0000 - mae: 9972.2529 - val_loss: 259006832.0000 - val_mae: 10496.8271 - learning_rate: 0.0010 +Epoch 30/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 1s 6ms/step - loss: 222754640.0000 - mae: 9393.5771 - val_loss: 231574736.0000 - val_mae: 9954.3838 - learning_rate: 0.0010 +Epoch 31/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 1s 6ms/step - loss: 198423008.0000 - mae: 8869.1445 - val_loss: 207840512.0000 - val_mae: 9421.6309 - learning_rate: 0.0010 +Epoch 32/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 2s 8ms/step - loss: 178380192.0000 - mae: 8440.0615 - val_loss: 187732656.0000 - val_mae: 8949.0840 - learning_rate: 0.0010 +Epoch 33/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 2s 9ms/step - loss: 161002656.0000 - mae: 7984.2749 - val_loss: 169997776.0000 - val_mae: 8558.7334 - learning_rate: 0.0010 +Epoch 34/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 1s 6ms/step - loss: 144927488.0000 - mae: 7591.0400 - val_loss: 154311120.0000 - val_mae: 8103.6113 - learning_rate: 0.0010 +Epoch 35/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 1s 6ms/step - loss: 128720472.0000 - mae: 7144.0977 - val_loss: 140611216.0000 - val_mae: 7693.2202 - learning_rate: 0.0010 +Epoch 36/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 1s 6ms/step - loss: 115451752.0000 - mae: 6745.2446 - val_loss: 130359144.0000 - val_mae: 7402.8223 - learning_rate: 0.0010 +Epoch 37/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 1s 6ms/step - loss: 103320352.0000 - mae: 6354.1328 - val_loss: 117524160.0000 - val_mae: 6989.3652 - learning_rate: 0.0010 +Epoch 38/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 1s 6ms/step - loss: 92202488.0000 - mae: 5979.1660 - val_loss: 109628784.0000 - val_mae: 6704.9717 - learning_rate: 0.0010 +Epoch 39/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 1s 6ms/step - loss: 83113800.0000 - mae: 5689.0278 - val_loss: 103527280.0000 - val_mae: 6496.6172 - learning_rate: 0.0010 +Epoch 40/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 2s 7ms/step - loss: 75200392.0000 - mae: 5414.2593 - val_loss: 97912072.0000 - val_mae: 6302.9551 - learning_rate: 0.0010 +Epoch 41/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 2s 7ms/step - loss: 69330976.0000 - mae: 5204.5713 - val_loss: 92983264.0000 - val_mae: 6090.5542 - learning_rate: 0.0010 +Epoch 42/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 1s 6ms/step - loss: 64068488.0000 - mae: 5041.2319 - val_loss: 93400048.0000 - val_mae: 6068.5435 - learning_rate: 0.0010 +Epoch 43/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 1s 6ms/step - loss: 59892128.0000 - mae: 4892.9224 - val_loss: 87246792.0000 - val_mae: 5791.9092 - learning_rate: 0.0010 +Epoch 44/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 1s 7ms/step - loss: 55631812.0000 - mae: 4743.8657 - val_loss: 85550016.0000 - val_mae: 5762.2046 - learning_rate: 0.0010 +Epoch 45/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 1s 6ms/step - loss: 53973056.0000 - mae: 4663.5952 - val_loss: 83252664.0000 - val_mae: 5683.0537 - learning_rate: 0.0010 +Epoch 46/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 1s 6ms/step - loss: 50220260.0000 - mae: 4527.8228 - val_loss: 83925944.0000 - val_mae: 5714.0728 - learning_rate: 0.0010 +Epoch 47/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 1s 6ms/step - loss: 47764060.0000 - mae: 4444.6938 - val_loss: 82401720.0000 - val_mae: 5649.3335 - learning_rate: 0.0010 +Epoch 48/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 2s 10ms/step - loss: 45334620.0000 - mae: 4341.4917 - val_loss: 80761736.0000 - val_mae: 5650.6733 - learning_rate: 0.0010 +Epoch 49/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 2s 7ms/step - loss: 43015352.0000 - mae: 4263.1714 - val_loss: 78728576.0000 - val_mae: 5562.2876 - learning_rate: 0.0010 +Epoch 50/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 1s 6ms/step - loss: 41184284.0000 - mae: 4168.4272 - val_loss: 76461192.0000 - val_mae: 5454.5000 - learning_rate: 0.0010 +Epoch 1/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 4s 10ms/step - loss: 1932731136.0000 - mae: 34764.9727 - val_loss: 2069818880.0000 - val_mae: 36377.0664 - learning_rate: 0.0010 +Epoch 2/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 2s 8ms/step - loss: 1923338112.0000 - mae: 34629.4414 - val_loss: 2056002432.0000 - val_mae: 36186.6602 - learning_rate: 0.0010 +Epoch 3/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 2s 10ms/step - loss: 1906462720.0000 - mae: 34385.0273 - val_loss: 2034484992.0000 - val_mae: 35888.1211 - learning_rate: 0.0010 +Epoch 4/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 2s 10ms/step - loss: 1882598784.0000 - mae: 34036.7148 - val_loss: 2005956992.0000 - val_mae: 35488.6562 - learning_rate: 0.0010 +Epoch 5/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 2s 8ms/step - loss: 1852413696.0000 - mae: 33614.3867 - val_loss: 1971230848.0000 - val_mae: 35029.5586 - learning_rate: 0.0010 +Epoch 6/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 2s 8ms/step - loss: 1816782592.0000 - mae: 33156.7617 - val_loss: 1931121664.0000 - val_mae: 34541.7891 - learning_rate: 0.0010 +Epoch 7/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 2s 8ms/step - loss: 1776370560.0000 - mae: 32665.1387 - val_loss: 1886418944.0000 - val_mae: 34027.4883 - learning_rate: 0.0010 +Epoch 8/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 2s 8ms/step - loss: 1731960960.0000 - mae: 32169.1250 - val_loss: 1837876736.0000 - val_mae: 33520.9844 - learning_rate: 0.0010 +Epoch 9/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 2s 10ms/step - loss: 1684573312.0000 - mae: 31682.2441 - val_loss: 1786250496.0000 - val_mae: 33008.1094 - learning_rate: 0.0010 +Epoch 10/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 2s 9ms/step - loss: 1634488960.0000 - mae: 31189.7012 - val_loss: 1732167680.0000 - val_mae: 32490.0215 - learning_rate: 0.0010 +Epoch 11/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 2s 8ms/step - loss: 1582652928.0000 - mae: 30698.1016 - val_loss: 1676277248.0000 - val_mae: 31974.6699 - learning_rate: 0.0010 +Epoch 12/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 2s 8ms/step - loss: 1528807936.0000 - mae: 30199.7773 - val_loss: 1619060736.0000 - val_mae: 31453.7559 - learning_rate: 0.0010 +Epoch 13/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 3s 9ms/step - loss: 1474490880.0000 - mae: 29703.1445 - val_loss: 1561115776.0000 - val_mae: 30939.0352 - learning_rate: 0.0010 +Epoch 14/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 2s 8ms/step - loss: 1419293056.0000 - mae: 29219.7656 - val_loss: 1502894208.0000 - val_mae: 30434.5215 - learning_rate: 0.0010 +Epoch 15/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 3s 12ms/step - loss: 1366217600.0000 - mae: 28764.8242 - val_loss: 1445071360.0000 - val_mae: 29948.1836 - learning_rate: 0.0010 +Epoch 16/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 2s 9ms/step - loss: 1311709056.0000 - mae: 28299.2773 - val_loss: 1387855616.0000 - val_mae: 29483.8652 - learning_rate: 0.0010 +Epoch 17/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 2s 8ms/step - loss: 1259717120.0000 - mae: 27880.5957 - val_loss: 1331703424.0000 - val_mae: 29047.3066 - learning_rate: 0.0010 +Epoch 18/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 2s 8ms/step - loss: 1208217600.0000 - mae: 27472.8125 - val_loss: 1276985344.0000 - val_mae: 28620.4238 - learning_rate: 0.0010 +Epoch 19/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 2s 8ms/step - loss: 1158040192.0000 - mae: 27063.4766 - val_loss: 1224060544.0000 - val_mae: 28202.8828 - learning_rate: 0.0010 +Epoch 20/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 3s 9ms/step - loss: 1109995136.0000 - mae: 26683.9062 - val_loss: 1173220352.0000 - val_mae: 27799.9453 - learning_rate: 0.0010 +Epoch 21/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 3s 12ms/step - loss: 1064320704.0000 - mae: 26317.3105 - val_loss: 1124794624.0000 - val_mae: 27414.1367 - learning_rate: 0.0010 +Epoch 22/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 2s 8ms/step - loss: 1022760384.0000 - mae: 25982.8965 - val_loss: 1079056128.0000 - val_mae: 27053.7539 - learning_rate: 0.0010 +Epoch 23/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 2s 9ms/step - loss: 982177088.0000 - mae: 25652.5938 - val_loss: 1036195840.0000 - val_mae: 26712.5508 - learning_rate: 0.0010 +Epoch 24/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 2s 9ms/step - loss: 945132608.0000 - mae: 25346.1582 - val_loss: 996421696.0000 - val_mae: 26395.3066 - learning_rate: 0.0010 +Epoch 25/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 2s 8ms/step - loss: 911153472.0000 - mae: 25062.0684 - val_loss: 959820032.0000 - val_mae: 26097.6113 - learning_rate: 0.0010 +Epoch 26/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 2s 8ms/step - loss: 880028288.0000 - mae: 24827.4922 - val_loss: 926537280.0000 - val_mae: 25821.6191 - learning_rate: 0.0010 +Epoch 27/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 3s 12ms/step - loss: 853682432.0000 - mae: 24604.5742 - val_loss: 896631808.0000 - val_mae: 25565.9453 - learning_rate: 0.0010 +Epoch 28/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 2s 8ms/step - loss: 828166080.0000 - mae: 24403.7070 - val_loss: 870131136.0000 - val_mae: 25342.5938 - learning_rate: 0.0010 +Epoch 29/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 2s 9ms/step - loss: 810969728.0000 - mae: 24281.2656 - val_loss: 846989760.0000 - val_mae: 25147.4238 - learning_rate: 0.0010 +Epoch 30/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 2s 8ms/step - loss: 790364480.0000 - mae: 24090.2852 - val_loss: 827042432.0000 - val_mae: 24978.3867 - learning_rate: 0.0010 +Epoch 31/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 2s 8ms/step - loss: 776585792.0000 - mae: 24009.8965 - val_loss: 810190016.0000 - val_mae: 24829.4160 - learning_rate: 0.0010 +Epoch 32/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 3s 8ms/step - loss: 764959104.0000 - mae: 23917.3750 - val_loss: 796246976.0000 - val_mae: 24708.1191 - learning_rate: 0.0010 +Epoch 33/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 3s 12ms/step - loss: 755435648.0000 - mae: 23878.0566 - val_loss: 784937344.0000 - val_mae: 24605.6172 - learning_rate: 0.0010 +Epoch 34/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 4s 9ms/step - loss: 747755712.0000 - mae: 23794.9590 - val_loss: 775911296.0000 - val_mae: 24518.1973 - learning_rate: 0.0010 +Epoch 35/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 2s 8ms/step - loss: 743594688.0000 - mae: 23785.2285 - val_loss: 768952064.0000 - val_mae: 24446.5508 - learning_rate: 0.0010 +Epoch 36/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 2s 8ms/step - loss: 739301056.0000 - mae: 23742.7988 - val_loss: 763610304.0000 - val_mae: 24390.0273 - learning_rate: 0.0010 +Epoch 37/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 2s 9ms/step - loss: 738923584.0000 - mae: 23775.4492 - val_loss: 759721984.0000 - val_mae: 24347.7969 - learning_rate: 0.0010 +Epoch 38/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 3s 12ms/step - loss: 735023744.0000 - mae: 23729.2637 - val_loss: 756937152.0000 - val_mae: 24316.7676 - learning_rate: 0.0010 +Epoch 39/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 2s 9ms/step - loss: 734647168.0000 - mae: 23751.9238 - val_loss: 754886144.0000 - val_mae: 24292.0117 - learning_rate: 0.0010 +Epoch 40/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 2s 9ms/step - loss: 732123008.0000 - mae: 23701.5078 - val_loss: 753405888.0000 - val_mae: 24274.6113 - learning_rate: 0.0010 +Epoch 41/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 2s 9ms/step - loss: 479316192.0000 - mae: 15964.3057 - val_loss: 403580480.0000 - val_mae: 13544.6035 - learning_rate: 0.0010 +Epoch 42/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 2s 9ms/step - loss: 343708096.0000 - mae: 12115.6484 - val_loss: 341869088.0000 - val_mae: 12275.1152 - learning_rate: 0.0010 +Epoch 43/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 3s 10ms/step - loss: 294795616.0000 - mae: 11060.8252 - val_loss: 294227808.0000 - val_mae: 11263.7822 - learning_rate: 0.0010 +Epoch 44/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 2s 11ms/step - loss: 255411248.0000 - mae: 10183.4551 - val_loss: 256555760.0000 - val_mae: 10408.2881 - learning_rate: 0.0010 +Epoch 45/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 2s 9ms/step - loss: 226594400.0000 - mae: 9585.3291 - val_loss: 226511520.0000 - val_mae: 9840.6309 - learning_rate: 0.0010 +Epoch 46/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 2s 8ms/step - loss: 201210784.0000 - mae: 8997.6709 - val_loss: 201730768.0000 - val_mae: 9280.6221 - learning_rate: 0.0010 +Epoch 47/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 2s 8ms/step - loss: 180951744.0000 - mae: 8508.0166 - val_loss: 180386640.0000 - val_mae: 8658.5586 - learning_rate: 0.0010 +Epoch 48/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 2s 8ms/step - loss: 161437456.0000 - mae: 8037.3340 - val_loss: 162152432.0000 - val_mae: 8236.7969 - learning_rate: 0.0010 +Epoch 49/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 2s 9ms/step - loss: 143073152.0000 - mae: 7514.1865 - val_loss: 141376416.0000 - val_mae: 7570.7739 - learning_rate: 0.0010 +Epoch 50/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 2s 11ms/step - loss: 127263440.0000 - mae: 7038.0229 - val_loss: 127391912.0000 - val_mae: 7156.2104 - learning_rate: 0.0010 +Epoch 1/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 3s 9ms/step - loss: 1930911488.0000 - mae: 34738.7148 - val_loss: 2063673600.0000 - val_mae: 36292.5195 - learning_rate: 0.0010 +Epoch 2/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 1s 6ms/step - loss: 1910318336.0000 - mae: 34439.6289 - val_loss: 2033005824.0000 - val_mae: 35867.5156 - learning_rate: 0.0010 +Epoch 3/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 1s 6ms/step - loss: 1872711552.0000 - mae: 33894.9219 - val_loss: 1984276864.0000 - val_mae: 35195.7305 - learning_rate: 0.0010 +Epoch 4/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 1s 6ms/step - loss: 1818348032.0000 - mae: 33175.7305 - val_loss: 1919553280.0000 - val_mae: 34405.4180 - learning_rate: 0.0010 +Epoch 5/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 1s 6ms/step - loss: 1750787072.0000 - mae: 32374.7637 - val_loss: 1842820480.0000 - val_mae: 33571.0039 - learning_rate: 0.0010 +Epoch 6/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 2s 9ms/step - loss: 1673256576.0000 - mae: 31570.9395 - val_loss: 1757159040.0000 - val_mae: 32726.9746 - learning_rate: 0.0010 +Epoch 7/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 2s 8ms/step - loss: 1589680640.0000 - mae: 30760.4844 - val_loss: 1665691136.0000 - val_mae: 31878.0684 - learning_rate: 0.0010 +Epoch 8/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 1s 6ms/step - loss: 1501466496.0000 - mae: 29948.0781 - val_loss: 1570926464.0000 - val_mae: 31025.0703 - learning_rate: 0.0010 +Epoch 9/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 3s 6ms/step - loss: 1412167296.0000 - mae: 29152.6699 - val_loss: 1475466880.0000 - val_mae: 30201.8711 - learning_rate: 0.0010 +Epoch 10/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 1s 6ms/step - loss: 1322615424.0000 - mae: 28367.2754 - val_loss: 1379597312.0000 - val_mae: 29309.5332 - learning_rate: 0.0010 +Epoch 11/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 1s 6ms/step - loss: 1220498816.0000 - mae: 26533.4805 - val_loss: 1260929408.0000 - val_mae: 26583.1484 - learning_rate: 0.0010 +Epoch 12/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 1s 6ms/step - loss: 1112165120.0000 - mae: 24360.7832 - val_loss: 1153268480.0000 - val_mae: 25070.1992 - learning_rate: 0.0010 +Epoch 13/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 2s 8ms/step - loss: 1014172736.0000 - mae: 22965.8555 - val_loss: 1049243264.0000 - val_mae: 23642.1152 - learning_rate: 0.0010 +Epoch 14/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 2s 9ms/step - loss: 918478592.0000 - mae: 21610.4453 - val_loss: 949395648.0000 - val_mae: 22258.7539 - learning_rate: 0.0010 +Epoch 15/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - loss: 828141056.0000 - mae: 20268.5742 - val_loss: 854286336.0000 - val_mae: 20897.9004 - learning_rate: 0.0010 +Epoch 16/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 1s 6ms/step - loss: 742179200.0000 - mae: 18959.4883 - val_loss: 764294528.0000 - val_mae: 19546.4355 - learning_rate: 0.0010 +Epoch 17/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 1s 6ms/step - loss: 660350528.0000 - mae: 17696.7012 - val_loss: 680099328.0000 - val_mae: 18261.0137 - learning_rate: 0.0010 +Epoch 18/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 1s 6ms/step - loss: 587759936.0000 - mae: 16515.5098 - val_loss: 602156032.0000 - val_mae: 16998.6445 - learning_rate: 0.0010 +Epoch 19/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 1s 6ms/step - loss: 516721568.0000 - mae: 15305.6777 - val_loss: 530436352.0000 - val_mae: 15776.6455 - learning_rate: 0.0010 +Epoch 20/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 1s 7ms/step - loss: 454774592.0000 - mae: 14225.0361 - val_loss: 465422528.0000 - val_mae: 14620.3174 - learning_rate: 0.0010 +Epoch 21/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 2s 9ms/step - loss: 400378592.0000 - mae: 13204.1484 - val_loss: 406941792.0000 - val_mae: 13484.5117 - learning_rate: 0.0010 +Epoch 22/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - loss: 351022656.0000 - mae: 12238.7500 - val_loss: 355148032.0000 - val_mae: 12487.7051 - learning_rate: 0.0010 +Epoch 23/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 1s 6ms/step - loss: 305887232.0000 - mae: 11356.9395 - val_loss: 309833376.0000 - val_mae: 11593.4004 - learning_rate: 0.0010 +Epoch 24/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 1s 7ms/step - loss: 265417856.0000 - mae: 10562.7656 - val_loss: 270137696.0000 - val_mae: 10775.0537 - learning_rate: 0.0010 +Epoch 25/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 1s 6ms/step - loss: 234852512.0000 - mae: 9912.8330 - val_loss: 235926240.0000 - val_mae: 10042.0000 - learning_rate: 0.0010 +Epoch 26/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 1s 6ms/step - loss: 204994608.0000 - mae: 9289.3477 - val_loss: 208168032.0000 - val_mae: 9438.7832 - learning_rate: 0.0010 +Epoch 27/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 1s 6ms/step - loss: 181129088.0000 - mae: 8759.8398 - val_loss: 183661520.0000 - val_mae: 8844.0303 - learning_rate: 0.0010 +Epoch 28/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 1s 6ms/step - loss: 159850000.0000 - mae: 8249.1475 - val_loss: 162752464.0000 - val_mae: 8357.9189 - learning_rate: 0.0010 +Epoch 29/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 2s 10ms/step - loss: 141057840.0000 - mae: 7771.1348 - val_loss: 144473072.0000 - val_mae: 7830.9922 - learning_rate: 0.0010 +Epoch 30/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 2s 7ms/step - loss: 125178776.0000 - mae: 7342.7188 - val_loss: 130381672.0000 - val_mae: 7483.4707 - learning_rate: 0.0010 +Epoch 31/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 1s 7ms/step - loss: 113134424.0000 - mae: 6976.5171 - val_loss: 116524848.0000 - val_mae: 7097.4570 - learning_rate: 0.0010 +Epoch 32/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 1s 6ms/step - loss: 101647680.0000 - mae: 6615.7490 - val_loss: 107708008.0000 - val_mae: 6762.4595 - learning_rate: 0.0010 +Epoch 33/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 1s 6ms/step - loss: 91810328.0000 - mae: 6306.7373 - val_loss: 97756488.0000 - val_mae: 6498.3496 - learning_rate: 0.0010 +Epoch 34/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 1s 6ms/step - loss: 82927784.0000 - mae: 6015.1938 - val_loss: 90644216.0000 - val_mae: 6210.5337 - learning_rate: 0.0010 +Epoch 35/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 1s 6ms/step - loss: 76303432.0000 - mae: 5791.7852 - val_loss: 84712688.0000 - val_mae: 5904.9824 - learning_rate: 0.0010 +Epoch 36/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 1s 6ms/step - loss: 71719856.0000 - mae: 5631.6865 - val_loss: 80555576.0000 - val_mae: 5748.7349 - learning_rate: 0.0010 +Epoch 37/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 2s 10ms/step - loss: 67707120.0000 - mae: 5521.0171 - val_loss: 78097168.0000 - val_mae: 5696.1294 - learning_rate: 0.0010 +Epoch 38/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 2s 7ms/step - loss: 63857660.0000 - mae: 5421.4185 - val_loss: 73372440.0000 - val_mae: 5529.1011 - learning_rate: 0.0010 +Epoch 39/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 1s 6ms/step - loss: 61203508.0000 - mae: 5300.4053 - val_loss: 70816400.0000 - val_mae: 5323.6621 - learning_rate: 0.0010 +Epoch 40/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 1s 6ms/step - loss: 59436396.0000 - mae: 5238.1948 - val_loss: 67635208.0000 - val_mae: 5235.2432 - learning_rate: 0.0010 +Epoch 41/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 1s 6ms/step - loss: 56774304.0000 - mae: 5124.1748 - val_loss: 66004324.0000 - val_mae: 5169.6328 - learning_rate: 0.0010 +Epoch 42/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 1s 6ms/step - loss: 53282044.0000 - mae: 4981.6040 - val_loss: 63594312.0000 - val_mae: 5073.1543 - learning_rate: 0.0010 +Epoch 43/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 1s 6ms/step - loss: 51761064.0000 - mae: 4912.4688 - val_loss: 60992804.0000 - val_mae: 4910.9961 - learning_rate: 0.0010 +Epoch 44/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 1s 6ms/step - loss: 49697432.0000 - mae: 4781.2227 - val_loss: 59893640.0000 - val_mae: 4884.7842 - learning_rate: 0.0010 +Epoch 45/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 2s 8ms/step - loss: 47390824.0000 - mae: 4688.0669 - val_loss: 57193784.0000 - val_mae: 4715.9609 - learning_rate: 0.0010 +Epoch 46/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - loss: 44774588.0000 - mae: 4550.4604 - val_loss: 55175160.0000 - val_mae: 4646.7446 - learning_rate: 0.0010 +Epoch 47/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 1s 6ms/step - loss: 45075144.0000 - mae: 4574.6504 - val_loss: 54762232.0000 - val_mae: 4689.1626 - learning_rate: 0.0010 +Epoch 48/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 1s 6ms/step - loss: 43724572.0000 - mae: 4516.4062 - val_loss: 53836544.0000 - val_mae: 4628.6943 - learning_rate: 0.0010 +Epoch 49/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 1s 6ms/step - loss: 42422604.0000 - mae: 4467.8540 - val_loss: 52762536.0000 - val_mae: 4572.4634 - learning_rate: 0.0010 +Epoch 50/50 +216/216 ━━━━━━━━━━━━━━━━━━━━ 1s 6ms/step - loss: 42222404.0000 - mae: 4455.9072 - val_loss: 51867440.0000 - val_mae: 4502.6763 - learning_rate: 0.0010 ++
10.6 Saving Trained Models¶
The three trained models are saved to disk immediately after training. This ensures +that the evaluation results, plots, and metrics produced in subsequent cells are +fully reproducible on every run without needing to retrain, since loading a saved +model always produces identical predictions regardless of any residual random seed +behaviour [58].
+# Save all three trained models to disk
+model_baseline.save("model_baseline.keras")
+model_stacked.save("model_stacked.keras")
+model_gru.save("model_gru.keras")
+
+print("✓ model_baseline saved")
+print("✓ model_stacked saved")
+print("✓ model_gru saved")
+✓ model_baseline saved +✓ model_stacked saved +✓ model_gru saved ++
# Load all three models from disk
+# This guarantees identical results on every run
+model_baseline = tf.keras.models.load_model("model_baseline.keras")
+model_stacked = tf.keras.models.load_model("model_stacked.keras")
+model_gru = tf.keras.models.load_model("model_gru.keras")
+
+print("✓ model_baseline loaded")
+print("✓ model_stacked loaded")
+print("✓ model_gru loaded")
+✓ model_baseline loaded +✓ model_stacked loaded +✓ model_gru loaded ++
10.8 Model Evaluation¶
Each model is evaluated on the held-out test set using three standard regression +metrics [59]. The test set represents the final unseen time +period and was not used at any point during training or validation, ensuring the +evaluation reflects genuine out-of-sample forecasting performance +[49].
+-
+
- MAE (Mean Absolute Error): The average absolute difference between predicted +and actual pedestrian counts. This is the most directly interpretable metric since +it is expressed in the same unit as the target variable — pedestrians per hour +[59]. +
- RMSE (Root Mean Squared Error): Similar to MAE but penalises larger errors +more heavily due to the squaring operation. A substantially higher RMSE relative +to MAE indicates the presence of occasional large prediction errors +[59]. +
- R² (Coefficient of Determination): Measures the proportion of variance in +pedestrian counts that the model successfully explains. A value of 1.0 represents +a perfect fit, while 0.0 means the model performs no better than simply predicting +the mean value [59]. +
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
+
+def evaluate_model(model, X_test, y_test, name="Model"):
+ y_pred = model.predict(X_test).flatten()
+ mae = mean_absolute_error(y_test, y_pred)
+ rmse = np.sqrt(mean_squared_error(y_test, y_pred))
+ r2 = r2_score(y_test, y_pred)
+ print(f"\n{name}")
+ print(f" MAE: {mae:.2f}")
+ print(f" RMSE: {rmse:.2f}")
+ print(f" R²: {r2:.4f}")
+ return y_pred, {"MAE": mae, "RMSE": rmse, "R2": r2}
+
+pred_baseline, metrics_baseline = evaluate_model(model_baseline, X_test_seq, y_test_seq, "Baseline LSTM")
+pred_stacked, metrics_stacked = evaluate_model(model_stacked, X_test_seq, y_test_seq, "Stacked LSTM")
+pred_gru, metrics_gru = evaluate_model(model_gru, X_test_seq, y_test_seq, "GRU")
+54/54 ━━━━━━━━━━━━━━━━━━━━ 0s 4ms/step + +Baseline LSTM + MAE: 5624.15 + RMSE: 8725.10 + R²: 0.9158 +54/54 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step + +Stacked LSTM + MAE: 9084.27 + RMSE: 14262.18 + R²: 0.7750 +54/54 ━━━━━━━━━━━━━━━━━━━━ 0s 4ms/step + +GRU + MAE: 4921.98 + RMSE: 7325.04 + R²: 0.9407 ++
Evaluation Discussion¶
The GRU model achieved the best performance across all three metrics, with an MAE of +4,921.98, RMSE of 7,325.04, and an R² of 0.9407. This means the GRU explains 94.1% +of the variance in hourly pedestrian demand on completely unseen test data, which +represents strong predictive performance for a real-world urban climate forecasting +task [59].
+Notably, the Stacked LSTM performed worst of the three models despite having the most +parameters at 35,777. This outcome highlights an important principle in deep learning +— increased model complexity does not always lead to better generalisation, +particularly when the dataset size does not justify the added capacity +[53] [62]. The Baseline LSTM achieved a +strong R² of 0.9158, confirming that even a single-layer recurrent model can capture +the majority of the temporal structure in this dataset +[54]. The GRU's improvement over the baseline demonstrates that +architectural efficiency can outperform raw model capacity on medium-sized datasets +[61] [62].
+10.9 Training Curves¶
The training and validation loss curves are plotted for each model to assess how +learning progressed across epochs and to check for any signs of overfitting +[53].
+A well-behaved training curve shows both the training and validation loss decreasing +together and converging toward a stable low value. If the validation loss begins to +rise while training loss continues to fall, this indicates overfitting — meaning the +model is memorising the training data rather than learning generalisable patterns +[60].
+fig, axes = plt.subplots(1, 3, figsize=(18, 5))
+
+for ax, history, name in zip(
+ axes,
+ [history_baseline, history_stacked, history_gru],
+ ["Baseline LSTM", "Stacked LSTM", "GRU"]
+):
+ ax.plot(history.history["loss"], label="Train Loss")
+ ax.plot(history.history["val_loss"], label="Val Loss")
+ ax.set_title(f"{name} — Training Curves")
+ ax.set_xlabel("Epoch")
+ ax.set_ylabel("MSE Loss")
+ ax.legend()
+
+plt.tight_layout()
+plt.show()
+Training Curve Discussion¶
All three models demonstrate healthy learning behaviour across the 50 training +epochs. Both the training and validation loss curves decrease consistently and +converge without significant divergence, indicating that no meaningful overfitting +occurred in any of the three models [60]. The Dropout +regularisation layers and the ReduceLROnPlateau callback both contributed to this +stable training behaviour +[60] [63].
+The Stacked LSTM validation curve is notably observed to plateau earlier and at a +higher loss value compared to the Baseline LSTM and GRU, which is consistent with +its weaker test set performance. The GRU and Baseline LSTM both show smooth +convergence with training and validation loss tracking closely throughout all 50 +epochs [54] [61].
+10.10 Actual vs Predicted Visualisation¶
To assess prediction quality beyond summary metrics, the actual and predicted +pedestrian counts are plotted across the first 200 hours of the test set. This +visualisation reveals how well each model captures the shape, magnitude, and timing +of the daily pedestrian demand cycle, and whether any systematic prediction errors +are present [3] [59].
+fig, axes = plt.subplots(3, 1, figsize=(14, 14))
+
+for ax, preds, name in zip(
+ axes,
+ [pred_baseline, pred_stacked, pred_gru],
+ ["Baseline LSTM", "Stacked LSTM", "GRU"]
+):
+ ax.plot(y_test_seq[:200], label="Actual", color="black")
+ ax.plot(preds[:200], label="Predicted", color="steelblue", alpha=0.8)
+ ax.set_title(f"{name} — Actual vs Predicted (First 200 Hours)")
+ ax.set_xlabel("Hour")
+ ax.set_ylabel("Pedestrian Count")
+ ax.legend()
+
+plt.tight_layout()
+plt.show()
+Visualisation Discussion¶
The actual versus predicted plots reveal clear qualitative differences between the +three models that are consistent with the quantitative evaluation results. The GRU +most closely follows the actual pedestrian demand curve across the 200-hour window, +including the sharp daytime peaks during business hours and the near-zero counts +during the early morning hours +[61] [62].
+The Baseline LSTM captures the general daily rhythm but slightly underestimates the +magnitude of peak values, producing a smoother curve that misses some of the sharper +intraday fluctuations [54]. The Stacked LSTM shows the most +pronounced smoothing effect, with predicted values appearing noticeably flattened at +daily peaks — capped around 55,000 pedestrians regardless of the actual peak — which +directly explains its substantially higher RMSE and lower R² relative to the other +two models [53].
+These visual results are fully consistent with the evaluation metrics and further +support the selection of the GRU as the best performing architecture for this +pedestrian forecasting task.
+comparison = pd.DataFrame({
+ "Model": ["Baseline LSTM", "Stacked LSTM", "GRU"],
+ "MAE": [metrics_baseline["MAE"], metrics_stacked["MAE"], metrics_gru["MAE"]],
+ "RMSE": [metrics_baseline["RMSE"], metrics_stacked["RMSE"], metrics_gru["RMSE"]],
+ "R²": [metrics_baseline["R2"], metrics_stacked["R2"], metrics_gru["R2"]],
+})
+print(comparison.to_string(index=False))
+Model MAE RMSE R² +Baseline LSTM 5624.145508 8725.096217 0.915807 + Stacked LSTM 9084.274414 14262.183844 0.775039 + GRU 4921.983887 7325.042252 0.940659 ++
10.12 Conclusion¶
The deep learning segment compared three recurrent neural network architectures — a +Baseline LSTM, a Stacked LSTM, and a GRU — for hourly pedestrian count forecasting +in Melbourne CBD using climate and engineered time-series features.
+The GRU model was identified as the best performing architecture, achieving an R² of +0.9407, MAE of 4,921.98 pedestrians per hour, and RMSE of 7,325.04 across the unseen +test set [59] [61]. These results +demonstrate that a well-designed recurrent model, even without excessive architectural +complexity, can effectively learn the temporal structure of urban pedestrian demand +from climate and time-series inputs +[53] [62].
+The finding that the Stacked LSTM underperformed despite having the most parameters +highlights an important consideration in deep learning model selection — that added +complexity must be justified by the volume and complexity of the data +[53] [57]. The GRU's parameter efficiency +and strong generalisation make it the most suitable architecture for this dataset size +and forecasting task [61].
+From an applied perspective, a model explaining 94.1% of the variance in hourly +pedestrian demand could realistically support urban planning decisions and help +commuters anticipate pedestrian congestion during varying climate conditions, directly +addressing the use case scenario outlined at the beginning of this notebook +[53] [56].
+