ADS506-Final-Project/code_notebook.qmd at main · tkbarb10/ADS506-Final-Project · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
---
title: "Group 3 Appendix"
authors: "Kamran Shirazi & Taylor Kirk"
date: "2025-12-08"
format:
  pdf:
    toc: true
    classoption: [svgnames]
include-in-header:
  - text: |
      \usepackage{fvextra}
      \DefineVerbatimEnvironment{Highlighting}{Verbatim}{breaklines,commandchars=\\\{\}}
knitr:
  opts_chunk:
    cache: true
---

```{r setup, include=FALSE, echo=FALSE, message=FALSE, warning=FALSE}
library(fpp3)
library(tidyverse)
library(knitr)
library(urca)
```

# Introduction

The Industrial Revolution has been a boon to humanity, leading to longer life spans, increased wealth, easy access to clean water, it gave us Netflix...but with every benefit there comes a cost, and the main cost with expansion of human flourishing has been what we now call climate change. This term encapsulates a host of changes to our environment such as rising sea levels, more extreme weather events, reduction in biodiversity, etc. The climate changing isn't particularly new, the climate is always changing. What's unique about this period in Earth's history is the rate at which everything is changing. And the consensus is that rate of change is driven by human activity dumping CO2 emissions into the atmosphere, driving up CO2 concentration in the atmosphere, causing the global temperature to rise and leading to these rapid ecosystem alterations.

# Project Problem/Statement

This project was inspired by a desire to get involved in understanding the scope of the problem and hopefully finding a way to solve or mitigate it. Specifically we wanted to find a way to accurately isolate the impact of CO2 emissions on the increasing temperature trend, and by extension an individuals (defined as nation, state, city, organization etc) impact on the warming trend. The belief is that by quantifying impact, people will be better equipped to take action on changing their behavior and upgrading systems in ways that slow or reverse the warming trends.

# Data Overview

```{r monthly-anomaly-overview}
#| message: false
#| warning: false
df <- read_csv("data/global_temp.csv")

glimpse(df, width = 45)

summary(df[3:6], digits = 2)
```

The data from Berkely Earth comes in the form of monthly anomalies. They use a baseline period (1950-1980) and record temperature observations as differences from the average temp (by month) of this period. They include the uncertainties, and 5, 10, and 20 year lag periods. For our purposes, we're only concerned with the `Monthly_Anomaly` column

## Convert to Tsibble and Temperatures

```{r convert-tsibble}
#| message: false
#| warning: false

Anomaly_data <- df |> mutate(
  Month = month.abb[Month],
  Date = yearmonth(paste(Month, Year)),
  Monthly_Anomaly = replace_na(Monthly_Anomaly, mean(Monthly_Anomaly, na.rm = T))) |>
  select(c(Date, Monthly_Anomaly)) |>
  as_tsibble(index = Date)

baseline_vector <- c(
  '1' = 12.30, '2' = 12.50, '3' = 13.13, '4' = 14.06,
  '5' = 15.00, '6' = 15.66, '7' = 15.90, '8' = 15.75,
  '9' = 15.18, '10' = 14.27, '11' = 13.28, '12' = 12.57
)

converted_df <- Anomaly_data |>
  mutate(
    month_char = as.character(month(Date)),
    baseline_temp = baseline_vector[month_char],
    actual_temp = baseline_temp + Monthly_Anomaly
    ) |>
  select(c(Date, actual_temp))

converted_df |>
  head() |>
  kable()
```

Reporting in monthly anomalies is common for publications given that the main interest is usually in the overall trend of warming. However for our purposes, since we are attempting to isolate the human effect on the warming trend, we also need to be able to accurately model temperature separate of human impact. The seasonality of global temperatures is a big part of that and using only the monthly Monthly_Anomaly measurements largely strips out that component. Both sets of data will be explored and modeled however to see which one is most useful and it may turn out that each is valuable in different capacities.

# Exploratory Data Analysis of Global Anomalies

```{r kamran-raw-data-overview}
glimpse(df, width = 50)

names(df)
```

This dataset contains 2,109 monthly observations from 1850–2025, with global temperature anomalies and uncertainties, where `Monthly_Anomaly` is the primary usable series and most multi-year `Monthly_Anomaly` fields contain NA values except at the end of their smoothing windows.

```{r eda-global-anomalies}
#| echo: true
#| message: false
#| warning: false

train <- Anomaly_data|> filter(Date < yearmonth("2020 Jan"))
test  <- Anomaly_data|> filter(Date >= yearmonth("2020 Jan"))

autoplot(Anomaly_data, Monthly_Anomaly) +
  labs(title = "Global Monthly Temperature Anomalies",
       x = "Year", y = "Temperature Monthly_Anomaly (°C)") +
  theme_minimal()

```

The plot shows that what was once mostly cooler-than-average (average as defined as the period between 1950 - 1980) global temperatures has transitioned into a new normal of sustained warming, particularly over the last 40 years. The sharp upward trend in recent decades signals the accelerating pace of climate change.

```{r kamran-decomp}
climate_stl <- Anomaly_data |>
  model(
    STL(Monthly_Anomaly ~ trend(window = 15))
  ) |>
  components()

autoplot(climate_stl) +
  labs(
    title = "STL Decomposition of Annual Temperature Anomalies",
    x = "Year"
  ) +
  theme_minimal()
```

The STL results highlight how the underlying warming signal has grown stronger over time, while the remainder shows noisy but relatively modest deviations from the trend. This reinforces that annual variability exists but is dwarfed by the long-term increase in global temperatures.

```{r kamran-acf}
#| warning: false
Anomaly_data |>
  gg_tsdisplay(Monthly_Anomaly, plot_type = "partial") +
  labs(
    title = "Annual Temperature Anomalies: Time Plot, ACF, and PACF"
  )
```

The climate series exhibits a persistent warming trend, and the ACF’s slow decay underscores how each year’s temperature is tightly linked to previous years. The PACF’s immediate drop-off reinforces that this warming signal creates strong year-to-year inertia in global temperatures.

```{r kamran-stationary-checks}
# KPSS
Anomaly_data |>
  features(Monthly_Anomaly, unitroot_kpss) |>
  kable(digits = 4, align = "c")
```

The KPSS test confirms what the plots already suggest: the `Monthly_Anomaly` series isn’t stationary, with the low ***p***-value indicating that the warming trend dominates over random fluctuations.

```{r kamran-differencing-transformation}
#| warning: false
# Non-seasonal differencing needed?
Anomaly_data |>
  features(Monthly_Anomaly, c(unitroot_ndiffs, unitroot_nsdiffs)) |>
  kable(digits = 4, align = "c")

Anomaly_data |>
  features(Monthly_Anomaly, guerrero) |>
  kable(digits = 4, align = "c")
```

$\lambda \approx 0.966$ → little to no Box–Cox transformation needed; variance is already stable. Unitroot test suggests differencing may be needed

# Exploratory Data Analysis of Global Temperatures

```{r eda-actual-temperatures}

summary(converted_df[2])

converted_df |>
  model(STL(actual_temp)) |>
  components() |>
  autoplot() +
  labs(title = "STL Decomposition of Global Monthly Actual Temperatures") +
  theme_minimal()
```

Decomposition of global temperatures show seasonality to mainly be constant with some compression in later years (could be a result of more precise measurements, could also be a result of warming trends flattening temperature swings each season). The main driver of the changing level is the trend which stayed fairly constant until 1920 when the trend starts to increase, then really takes off after 1980

```{r gg-season}
#| message: false
#| warning: false

converted_df |>
  ggtime::gg_season() +
  labs(y = "Temperature C") +
  theme_minimal()
```

Clearer picture of the increasing trend over time, occurring across all periods

```{r transformations}

converted_df |>
  features(actual_temp, guerrero) |>
  kable(digits = 4, align = "c")
```

$\lambda \approx 1.44$, so a power transformation may be helpful

```{r acf-pacf}

gg_tsdisplay(converted_df, actual_temp, plot_type = 'partial')

converted_df |>
  features(actual_temp, c(unitroot_kpss, unitroot_ndiffs, unitroot_nsdiffs)) |>
  kable(digits = 4, align = "c")
```

Unitroot tests suggest at least one difference and one seasonal difference. Using `gg_tsdisplay`, the seasonal autocorrelation is obvious, with the summer and winter months grouping together on opposite sides of the line. The pacf shows that the seasonal effects are strong. All of this shows the data is not stationary

# Modeling / Forecasting

## Monthly_Anomaly Modeling

```{r kamran-ets-model}
#| echo: true
#| message: false
#| warning: false

# Fit ETS on the training data

fit_ets <- train |>
model(ETS(Monthly_Anomaly))

# Print model details

report(fit_ets)

```

```{r}
#| label: ets-forecast-zoom
#| echo: true
#| fig-cap: "Auto-ETS forecast vs actuals (2020–2025)."

fc_ets <- fit_ets |>
  forecast(new_data = test)

fc_ets_zoom <- fc_ets|>
  filter_index("2020 Jan" ~ "2025 Dec")

autoplot(
  fc_ets_zoom,
  Anomaly_data |> filter_index("2020 Jan" ~ "2025 Dec")
  ) +
  labs(
    title = "Auto-ETS Forecast vs Actuals (2020–2025)",
    subtitle = "Automatically selected ETS model",
    x = "Year",
    y = "Temperature Monthly_Anomaly (°C)"
    )

```

The ETS model anticipates a relatively stable `Monthly_Anomaly` level, but the actual values frequently climb toward the upper edge of the 80% and even 95% prediction intervals. This pattern indicates that recent warming spikes are occurring faster and more intensely than the model expects based on historical structure. When observations consistently press against the top of the interval bands, it suggests the model may be underestimating both the trend strength and the volatility of contemporary climate behavior.

## Temperature Modeling

```{r training-split}

cv_data <- converted_df |>
  stretch_tsibble(.init = 1506, .step = 60)

cv_trn <- cv_data |>
  group_by(.id) |>
  slice(1:(n() - 60)) |>
  ungroup()

cv_valid <- cv_data |>
  group_by(.id) |>
  slice_tail(n = 60) |>
  ungroup()
```

Modeling on all temp data with cross validation sets created from the first roughly 125 years of data (1850 - 1930) and rolling forward by 5 years (60 months). This creates 11 cv splits covering the range of available observations

### ETS

```{r temperature-ets}
#| message: false
#| warning: false

ets_fit <- cv_trn |>
  model(
    ets_auto = ETS(),
    additive = ETS(actual_temp ~ error("A") + trend("A") + season("A")),
    multiplicative = ETS(actual_temp ~ error("M") + trend("A") + season("M")),
    damped = ETS(actual_temp ~ error("A") + trend("Ad") + season("A")),
    mult_damp = ETS(actual_temp ~ error("M") + trend("Ad") + season("M")),
    log_auto = ETS(log(actual_temp)),
    box_cox_auto = ETS(box_cox(actual_temp, lambda = 1.5))
  )
```

```{r ets-training-metrics}
#| message: false
#| warning: false

ets_fit |>
  accuracy() |>
  group_by(.model) |>
  summarise(across(.cols = c(RMSE, MAE, ME), .fns = mean)) |>
  arrange(RMSE) |>
  kable(digits = 4, align = "c")

# Smoothing parameters
ets_fit |>
  tidy() |>
  group_by(.model, term) |>
  summarise(across(.cols = estimate, .fns = mean)) |>
  pivot_wider(names_from = term, values_from = estimate) |>
  select(.model, alpha, beta, gamma, phi) |>
  kable(digits = 4, align = "c")

# Report on top performing model
ets_fit |>
  filter(.id == 10) |>
  select(log_auto) |>
  report()
```

All versions of ETS performed well on the training data with box_cox_auto performing the best. Auto selection chose an additive model with a damped trend. Looking the parameters, the `alpha` parameter values ranged from .2812 to .4469. This shows that the level is fairly stable and perhaps indicates that the recent warming trend is being masked by the long tail of relatively stable temperatures. This idea is bolstered by the the really small, almost 0 values of the `beta` parameter which indicates a stable trend. The `gamma` parameters are all very small as showing, suggesting that the ETS models are correctly reading that the seasonal pattern is relatively stable.

```{r ets-forecasts}

ets_fc <- ets_fit |>
  forecast(new_data = cv_valid)

ets_fc |>
  accuracy(cv_valid) |>
  group_by(.model) |>
  summarise(across(.cols = c(RMSE, MAE, ME), .fns = mean)) |>
  arrange(RMSE) |>
  kable(digits = 4)
```

The additive model performs the best on the validation data but the out performance is negligible to `ets_auto` and `box_cox_auto`

```{r ets-diagnostics}
#| fig-width: 10

ets_fc |>
  filter(.id %in% c(10, 11), .model %in% c('additive', 'box_cox_auto', 'ets_auto')) |>
  autoplot() +
  autolayer(converted_df |> filter(year(Date) > 2010), actual_temp) +
  facet_grid(.model ~ .id) +
  theme_minimal()
```

Visualizing the top 3 models illustrates how ETS models do just fine at predicting the fall and spring months, but consistently undershoot the seasonal peaks and exhibit wide prediction intervals around those areas. ETS models have trouble capturing the increasing trend

```{r ets-residuals}

ets_fit |>
  select(box_cox_auto) |>
  tail(n = 1) |>
  ggtime::gg_tsresiduals()
```

Using box cox and checking the residuals of the model fitted on the most data, the residuals show a mostly normal distribution. Slight skew to the right. There are autocorrelations at lag 3 and lag 24. We can confirm that the residuals aren't white noise with a `Ljung-Box` test. There are certainly factors that influence temperature that ETS isn't capturing

```{r ljung-box-ets}
ets_fit |>
  augment() |>
  filter(.model == "box_cox_auto") |>
  features(.innov, ljung_box, lag = 24) |>
  summarize(across(.cols = lb_pvalue, .fns = mean)) |>
  kable(digits = 4, align = "c")
```

### TSLM

```{r temperature-tslm}

tslm_fit <- cv_trn |>
  model(
    tslm_auto = TSLM(actual_temp),
    tslm_trend = TSLM(actual_temp ~ trend()),
    tslm_trend_season = TSLM(actual_temp ~ trend() + season()),
    tslm_fourier = TSLM(actual_temp ~ trend() + fourier(K = 2)),
    tslm_log = TSLM(log(actual_temp) ~ trend() + season()),
    tslm_box_cox = TSLM(box_cox(actual_temp, lambda = 1.5) ~ trend() + season()),
    tslm_piecewise = TSLM(actual_temp ~ trend(knots = c(1920, 1975)) + season())
  )
```

Testing Fourier terms for a simpler model as opposed to using season dummy variables, and piece-wise to capture changes in the trend at those points in the data.

```{r tslm-training-metrics}

tslm_fit |>
  accuracy() |>
  group_by(.model) |>
  summarise(across(.cols = c(RMSE, MAE, ME), .fns = mean)) |>
  arrange(RMSE) |>
  kable(digits = 4)

tslm_fit |>
  select(tslm_trend_season) |>
  tail(n = 1) |>
  report()
```

The `trend_season`, `box_cox` and `piecewise` models all perform the same on the training data. Looking at the model report for the `trend_season` model we can see that all parameters are significant.

```{r tslm-forecasts}
#| warning: false

tslm_fc <- tslm_fit |>
  forecast(new_data = cv_valid)

tslm_fc |>
  accuracy(cv_valid) |>
  group_by(.model) |>
  summarise(across(.cols = c(RMSE, MAE, ME), .fns = mean)) |>
  arrange(MAE) |>
  kable(digits = 4)
```

TSLM log performs the best on the validation data but significantly under-perform ETS models

```{r tslm-diagnostics}
#| fig-width: 10

tslm_fc |>
  filter(.id %in% c(10, 11), .model %in% c('tslm_log', 'tslm_piecewise', 'tslm_box_cox')) |>
  autoplot() +
  autolayer(converted_df |> filter(year(Date) > 2015), actual_temp) +
  facet_grid(.model ~ .id) +
  theme_minimal()
```

It's pretty clear from the visuals that TSLM models are failing even harder than the ETS models at capturing the warming trend.

### ARIMA

```{r arima-data}

trn_data <- converted_df |>
  slice(1:(n() - 60))

valid_data <- converted_df |>
  slice_tail(n = 60)
```

ARIMA models take too long to converge on CV splits and often time out, so testing them first on normal splits

```{r temperature-arima}
#| message: false
arima_fit <- trn_data |>
  model(
    arima_auto = ARIMA(actual_temp),
    arima_box = ARIMA(box_cox(actual_temp, lambda = 1.5)),
    arima_log = ARIMA(log(actual_temp))
  )

arima_fit |>
  accuracy() |>
  select(.model, RMSE, MAE, ME) |>
  arrange(RMSE) |>
  kable(digits = 4)
```

ARIMA box_cox and auto performed the same on the training data

```{r arima-report}
arima_fit |>
  select(arima_box) |>
  report()

arima_fit |>
  select(arima_auto) |>
  report()
```

Surprisingly the ARIMA models only selected a seasonal differencing instead of both a seasonal and non-seasonal. 1 non-seasonal AR component was selected and 1 seasonal moving average. So the auto selected models are using the previous observation and the previous seasons forecast error to make its predictions.

```{r arima-forecasts}
#| warning: false

arima_fc <- arima_fit |>
  forecast(new_data = valid_data)

arima_fc |>
  accuracy(valid_data) |>
  group_by(.model) |>
  summarise(across(.cols = c(RMSE, MAE, ME), .fns = mean)) |>
  arrange(RMSE) |>
  kable(digits = 4)
```

Arima log does best on the validation data by a large margin.

```{r arima-diagnostics}

arima_fc |>
  autoplot() +
  autolayer(converted_df |> filter(year(Date) > 2015), actual_temp) +
  facet_wrap(~ .model, ncol = 1) +
  theme_minimal()
```

We see that the ARIMA log model does slightly better at capturing the peaks than the other two, but all still don't quite capture the increasing trend.

```{r arima-residuals}
#| message: false
#| warning: false

arima_fit |>
  select(arima_log) |>
  ggtime::gg_tsresiduals(plot_type = 'partial')
```

```{r ljung-box-arima}
arima_fit |>
  augment() |>
  filter(.model == "arima_box") |>
  features(.innov, ljung_box, lag = 24) |>
  kable(digits = 4, align = "c")
```

Residuals aren't even close to stationary. With ARIMA we can experiment with other parameter values

```{r arima-round-two}
arima_fit <- trn_data |>
  model(
    arima_diff = ARIMA(actual_temp ~ pdq(d = 1)),
    arima_box_diff = ARIMA(box_cox(actual_temp, lambda = 1.5) ~ pdq(d = 1)),
    arima_box_ar_two = ARIMA(box_cox(actual_temp, lambda = 1.5) ~ pdq(p = 2)),
    arima_custom = ARIMA(actual_temp ~ 0 + pdq(3, 1, 2) + PDQ(1, 1, 2))
  )

arima_fc <- arima_fit |>
  forecast(new_data = valid_data)

arima_fc |>
  accuracy(valid_data) |>
  group_by(.model) |>
  summarise(across(.cols = c(RMSE, MAE, ME), .fns = mean)) |>
  arrange(RMSE) |>
  kable(digits = 4)
```

Adding non-seasonal differencing, another AR term and a seasonal MA term to deal with the autocorrelation at lag 24 helped improved model results on the validation data

```{r ljung-box-arima-2}
arima_fit |>
  augment() |>
  filter(.model == "arima_custom") |>
  features(.innov, ljung_box, lag = 24) |>
  kable(digits = 4, align = "c")
```

A high p-value tells us that the residuals are now resembling white noise

### Compare Best Models

```{r initial-comparison}
#| warning: false

compare_fit <- cv_trn |>
  model(
    ETS = ETS(actual_temp ~ error("A") + trend("A") + season("A")),
    TSLM = TSLM(log(actual_temp) ~ trend() + season()),
    ARIMA = ARIMA(actual_temp ~ 0 + pdq(3, 1, 2) + PDQ(1, 1, 2))
  )

compare_fc <- compare_fit |>
  forecast(new_data = cv_valid)

compare_fc |>
  accuracy(cv_valid) |>
  filter(.id != 6) |>
  group_by(.model) |>
  summarise(across(.cols = c(RMSE, MAE, ME), .fns = mean)) |>
  arrange(RMSE) |>
  kable(digits = 4)


compare_fc |>
  filter(.id %in% c(10, 11)) |>
  autoplot() +
  autolayer(converted_df |> filter(year(Date) > 2010), actual_temp) +
  facet_wrap(~ .model, ncol = 1) +
  theme_minimal()
```

The metrics based on RMSE show that ARIMA does slightly better than ETS and both do much better than TSLM. Comparing ARIMA to ETS visually, we see much more uncertainty in the ETS predictions. Both models still seem to under shoot the peaks which is a strong indicator that other factors are at play that will need to be considered. In addition, looking at the `mable` for the arima models shows a null result on split 6 so there may be some instability in the chosen parameters

```{r arima-mable}
compare_fit |>
  select(ARIMA)
```

### TSLM External Predictors

According to the Intergovernmental Panel on Climate Change (IPCC, 2021), the main factors that influence global temperature are greenhouse gas concentrations, natural weather and ocean-atmosphere patterns, and variations in solar irradiance. To see which drivers best serve our forecasting purpose, we can model these factors individually and compare their predictive performance. Of the below predictors, accurate monthly estimates are available across varying time periods. The latest are for CO2 (1958), CH4 and N2O (1977). CO2 is a more significant predictor than the others so the cutoff date for the training data used was 1958. Techniques used to simulate the monthly estimates for N2O and CH4 between 1958 and 1977 will be discussed below

<https://www.ipcc.ch/report/ar6/wg1/>

**TSI**

TSI is a measure of energy that the Earth receives from the Sun across all lengths of electromagnetic waves and plays a small but meaningful role in global temperatures The measurement is taken at the top of Earth's atmosphere and is reported in Watts per square meter. The data from this predictor comes from https://www.ncei.noaa.gov/products/climate-data-records/total-solar-irradiance and is available in monthly observations dating back to 1874.

**ENSO Index**

ENSO is an acronym for El Nino - Southern Oscillation. The index is the measurement of the difference in sea surface temperatures (SST) from the long-term average (long-term defined as the period between 1991 and 2000) in the equatorial Pacific Ocean (Bureau of Meteorology, n.d.). Warmer than usual temperatures are referred to as El Nino and cooler than average temperatures are referred to as La Nina events. Part of the challenge of sourcing this data is that there are several different indices reported from several different locations around the world. The mostly commonly used one comes from the Nino 3.4 region. For this project the Ensemble Oceanic Nino Index (ENS ONI) is used and can be read about here https://www.webberweather.com/ensemble-oceanic-nino-index.html. The data extends back to 1850. The main limitation is that the modern record estimates for the ENSO index began in roughly 1950, so various statistical techniques paired with historical data and documents have been used to record estimates going back further.

In addition to the index itself, various other feature engineering steps were performed. The enso index is a monthly measurement so there was concern about noise unduly influencing the models. Three new variables were created with different smoothing intervals (3-month, 6-month and 12-month) to test if smoothing the variables improved model performance. In addition, it’s unreasonable to assume that any differences in the ocean temperature today will immediately effect the global temperature estimates. Indeed, El Nino and La Nina events are defined as 5 consecutive overlapping 3-month periods of temperatures above or below a 0.5 degree threshold https://ggweather.com/enso/oni.html. So another variable was created that took the 3-month smooth ENSO Index and added a 4-month lag.

Lastly, instead of using the numerical index, dummy variables were created to indicate the presence of an El Nino or La Nina weather event for the relevant time period.

**CO2**

It’s well known that Greenhouse Gasses (GHG’s) are the main drivers of global warming, with CO2 making up the majority of emissions. The most reliable CO2 concentration observations are reported in parts per million (ppm) and are recorded at the https://gml.noaa.gov/ccgg/trends/. The Manua Loa Observatory did not begin recording CO2 concentration data until 1958. However, annual atmospheric CO2 estimates derived from drilled ice cores date back to one million years. To engineer monthly estimates of CO2 concentration, a spline regression was performed on the annual estimates from 1850-1957 to smooth out the step curve. The seasonal variation was then engineered by taking the average seasonal variation of the Mauna Loa observations, scaling by a factor to account for the lower historical concentration, then adding that back to the engineered monthly observation.

As atmospheric concentration of CO2 can take time to show up in global temperatures, in addition to the current observation of CO2 being used, lagged versions were also engineered for periods of 1, 3, 5 and 10 years.

**RF_CO2**

Radiative forcing is not a measurement but rather a scientific principle. It is used as a way to quantify how the Earth’s balance of energy (in this case measured as Watts per meter squared) changes when another variable changes. For this project, the radiative forcing of CO₂ was measured. It was decided to only use CO₂ as this variable is recognized as having the largest forcing effect of the various physical drivers of temperature change. The equation used to calculate the radiative forcing variable is as follows

RF = 5.25 x log(Cₜ / C₀)

Where C0 is defined as the reference CO₂ concentration level prior to the industrial revolution. 278 ppm is the most widely used value.

**Volcanic Activity**

Volcanic eruptions can have profound effects on the climate that last anywhere from 1 to 3 years. The National Oceanic and Atmospheric Administration (NOAA) keeps a record of all known volcanic eruptions and their various attributes dating back 1000’s of years. Volcanic eruptions are happening constantly. There are 613 recorded events dating back to 1850. The strength of an eruption is measured by a volcanic explosivity index (VEI). Similar to the Richter scale, this index ranges from 0 to 8 and is logarithmic. Measuring the effect size of a volcanic eruption on the overall climate is challenging and depends on several factors. For the purposes of this project it was decided to use only volcanic events with a VEI of 4 or above. Any lower and it’s unlikely the explosion would be large enough to have lasting effects. Then it was a question of how to include those events in the data. A thorough scientific exploration of the exact effect on the climate for each event was beyond the scope of this project. However 3 things are known about volcanic activity. 1) Large eruptions have an impact on the overall climate. 2) These eruptions have effects lasting 1 - 3 years. 3) These effects are not constant over that time period. So to include these events into the data, 2 pseudo-dummy variables were created. A value of 1 was used at the start of the event, and then one variable was subjected to a linear decay over a 3yr time period and another was subject to an exponential decay over the 3 year time period, to simulate the effect of ash and debris dissipating over time. In the event of overlapping eruptions, these effects were added together.

**CH4 & N2O**

A similar process was followed for these predictors as was used for CO2. The data was sourced from the same place, the only difference is that monthly estimates are not available until 1977

```{r regressor-data}

predictor_modeling <- read_rds("data/lagged_external_predictors.rds")

length(colnames(predictor_modeling))
```

After feature engineering, the dataset now has 30 columns. The code used to engineer the external predictors can be found [at the end](#external-predictor-feature-engineering-steps)

There are over 100 total combinations of external predictors, so instead of trying all of them, a handful of complementary ones are tested and reviewed to see if any other combinations are worth exploring. For example, current co2_ppm combined with la_nina and el_nino events are compared against the same but with a 10yr lag on the co2_ppm variable. If the lagged version of this model performs better, then lagged trends are worth looking further into

```{r modeling-with-el_nino}
#| message: false
#| warning: false

cv_data <- predictor_modeling |>
  filter(year(Date) > 1957) |>
  stretch_tsibble(.init = 573, .step = 60)

cv_trn <- cv_data |>
  group_by(.id) |>
  slice(1:(n() - 60)) |>
  ungroup()

cv_valid <- cv_data |>
  group_by(.id) |>
  slice_tail(n = 60) |>
  ungroup()

tslm_regressor_fit <- cv_trn |>
  model(
    tslm_trend_season = TSLM(actual_temp ~ trend() + season()),
    tslm_log = TSLM(log(actual_temp) ~ trend() + season()),
    tslm_box_cox = TSLM(box_cox(actual_temp, lambda = 1.5) ~ trend() + season()),
    tslm_all = TSLM(actual_temp ~ trend() + season() + el_nino + la_nina + co2_ppm + ch4_ppb + n2o_ppb + TSI + volcano_forcing),
    tslm_all_box = TSLM(box_cox(actual_temp, lambda = 1.5) ~ trend() + season() + el_nino + la_nina + co2_ppm + ch4_ppb + n2o_ppb + TSI + volcano_forcing),
    tslm_ch4_co2_nino = TSLM(actual_temp ~ trend() + season() + el_nino + la_nina + co2_ppm + ch4_ppb),
    tslm_co2_nino = TSLM(actual_temp ~ trend() + season() + el_nino + la_nina + co2_ppm),
    tslm_nino = TSLM(actual_temp ~ trend() + season() + el_nino + la_nina),
    tslm_enso = TSLM(actual_temp ~ trend() + season() + ENSO),
    tslm_enso_smooth = TSLM(actual_temp ~ trend() + season() + enso_smooth_12),
    tslm_co2_nino_lag = TSLM(actual_temp ~ trend() + season() + el_nino + la_nina + co2_lag1)
  )

tslm_regressor_fit |>
  accuracy() |>
  group_by(.model) |>
  summarise(across(.cols = c(RMSE, MAE, ME), .fns = mean)) |>
  arrange(RMSE) |>
  kable(digits = 4)
```

Training metrics show that all the predictors being added leads to the best model, whereas models without any predictors underperform

```{r tslm-glance}

tslm_regressor_fit |>
  glance() |>
  group_by(.model) |>
  summarize(across(.cols = c(AIC, AICc, BIC), .fns = mean)) |>
  arrange(AICc)
```

Interestingly, tslm_log is the best model by far on the AICc metric, so the gains may not be all that much by adding predictors

```{r check-model-report}

tslm_regressor_fit |>
  select(tslm_all_box) |>
  tail(n = 1) |>
  report()
```

All predictors on the best performing training model show as significant except for N2O

```{r regressor-validation}

tslm_fc <- tslm_regressor_fit |>
  forecast(new_data = cv_valid)

tslm_fc |>
  accuracy(cv_valid) |>
  group_by(.model) |>
  summarise(across(.cols = c(RMSE, MAE, ME), .fns = mean)) |>
  arrange(RMSE) |>
  kable(digits = 4)
```

TSLM with a box_cox transformation and all of the predictors performed the best on the validation data. The `enso_smooth` variable also performed better than either `ENSO` or the nino dummy variables. These results indicate that other model combinations are worth exploring.

Adding a year lag to `co2_ppm` provided a bit of a boost on the validation data so some of the lagged variables may have more predictive power and are worth looking into

Interesting to note that adding `ch4_ppb` actually reduced model performance compared to just using `co2_ppm`

```{r tslm-round-two}
tslm_second_fit <- cv_trn |>
  model(
    tslm_all_lag_box = TSLM(box_cox(actual_temp, lambda = 1.5) ~ trend() + season() + el_nino + la_nina + co2_lag1 + n2o_lag3 + ch4_lag3),
    tslm_nino_five_lag = TSLM(actual_temp ~ trend() + season() + el_nino + la_nina + co2_lag5),
    tslm_enso_co2_lag = TSLM(actual_temp ~ trend() + season() + ENSO + co2_lag1),
    tslm_enso_smooth_lag_all = TSLM(actual_temp ~ trend() + season() + enso_smooth_12 + co2_lag1 + n2o_lag3 + ch4_lag3),
    tslm_all_lag = TSLM(actual_temp ~ trend() + season() + el_nino + la_nina + co2_lag1 + n2o_lag3 + ch4_lag3),
    tslm_nino_10_lag = TSLM(box_cox(actual_temp, lambda = 1.5) ~ trend() + season() + el_nino + la_nina + co2_lag10),
    tslm_enso_smooth_lag = TSLM(box_cox(actual_temp, lambda = 1.5) ~ trend() + season() + enso_smooth_6 + co2_lag10)
  )

tslm_second_fc <- tslm_second_fit |>
  forecast(new_data = cv_valid)

tslm_second_fc |>
  accuracy(cv_valid) |>
  group_by(.model) |>
  summarise(across(.cols = c(RMSE, MAE, ME), .fns = mean)) |>
  arrange(RMSE) |>
  kable(digits = 4)
```

The target variable undergoing a box_cox transformation using the smoothed enso index over 6 months and a 10 year lag in CO2 concentration performs the best out of all the models by a fair margin

```{r round-two-report}
tslm_second_fit |>
  select(tslm_enso_smooth_lag) |>
  tail(n = 1) |>
  report()
```

And we can see from the report that all the predictors used are significant except trend. It could be that smoothing out the enso index and the 10 yr lag of co2_ppm captures most of the trend component

```{r tslm-best-model}
tslm_best_fit <- cv_trn |>
  model(
    tslm_enso_6 = TSLM(box_cox(actual_temp, lambda = 1.5) ~ trend() + season() + enso_smooth_6 + co2_lag10),
    tslm_enso_12 = TSLM(box_cox(actual_temp, lambda = 1.5) ~ trend() + season() + enso_smooth_12 + co2_lag10),
    tslm_no_trend = TSLM(box_cox(actual_temp, lambda = 1.5) ~ season() + enso_smooth_12 + co2_lag10),
    tslm_volcanic = TSLM(box_cox(actual_temp, lambda = 1.5) ~ season() + enso_smooth_12 + co2_lag10 + volcano_forcing)
  )

tslm_best_fc <- tslm_best_fit |>
  forecast(new_data = cv_valid)

tslm_best_fc |>
  accuracy(cv_valid) |>
  group_by(.model) |>
  summarise(across(.cols = c(RMSE, MAE, ME), .fns = mean)) |>
  arrange(RMSE) |>
  kable(digits = 4)
```

```{r tslm-best-report}
tslm_best_fit |>
  select(tslm_no_trend) |>
  tail(n = 1) |>
  report()
```

```{r regressor-visual}
#| fig-width: 10

tslm_fc |>
  bind_rows(tslm_second_fc) |>
  bind_rows(tslm_best_fc) |>
  filter(.id %in% c(4, 5), .model %in% c("tslm_no_trend", "tslm_enso_smooth_lag", "tslm_nino_50_lag", "tslm_co2_nino_lag", "tslm_log")) |>
  autoplot() +
  autolayer(predictor_modeling |> filter(year(Date) >= 2015), actual_temp) +
  facet_grid(.model ~ .id) +
  theme_minimal()
```

Comparing best models to the under-performing ones, we see that adding predictors, especially ones that are lagged, closes the gap in the increasing trend that other models seem to miss. **Talk more in depth about the AICc scores**

```{r tslm diagnosis}

tslm_best_fit |>
  select(tslm_no_trend) |>
  tail(n = 1) |>
  ggtime::gg_tsresiduals()
```

The biggest issue is that there is significant autocorrelation in the residuals, indicating there is some pattern the TSLM model doesn't catch. We'll see if the ARIMA models are able to accurately account for these factors

### ARIMA External Predictors

Using a non-cv split initially to speed up fit time

```{r arima-data-regressors}

arima_trn <- predictor_modeling |>
  filter(year(Date) > 1957) |>
  slice(1:(n() - 60))

arima_valid <- predictor_modeling |>
  filter(year(Date) > 1957) |>
  slice_tail(n = 60)
```

```{r initial-arima}

arima_fit <- arima_trn |>
  model(
    arima_auto = ARIMA(actual_temp ~ co2_ppm + PDQ(D=1)),
    arima_box = ARIMA(box_cox(actual_temp, lambda = 1.5) ~ co2_ppm + PDQ(D=1)),
    arima_log = ARIMA(log(actual_temp) ~ co2_ppm + PDQ(D=1)),
    arima_all = ARIMA(actual_temp ~ co2_ppm + ch4_ppb + n2o_ppb + TSI + el_nino + la_nina + volcano_linear + PDQ(D=1)),
    arima_lag_one = ARIMA(actual_temp ~ co2_lag1 + PDQ(D=1)),
    arima_lag_one_nino = ARIMA(actual_temp ~ co2_lag1 + el_nino + la_nina + PDQ(D=1)),
    arima_enso = ARIMA(actual_temp ~ ENSO + PDQ(D=1)),
    arima_enso_smooth = ARIMA(actual_temp ~ enso_smooth_6 + PDQ(D=1)),
    arima_emissions = ARIMA(actual_temp ~ aggregate_emissions + PDQ(D=1)),
    arima_forcing = ARIMA(log(actual_temp) ~ rf_co2 + PDQ(D=1)),
    arima_enso_delay = ARIMA(actual_temp ~ enso_delay + PDQ(D=1)),
    arima_lag_3 = ARIMA(actual_temp ~ co2_lag3 + PDQ(D=1)),
    arima_physics_robust = ARIMA(log(actual_temp) ~ rf_co2 + volcano_linear + enso_delay + PDQ(D=1)),
    arima_volcano_basic = ARIMA(log(actual_temp) ~ ENSO + co2_ppm + volcano_forcing + PDQ(D = 1))
  )

arima_fit |>
  glance() |>
  arrange(AICc) |>
  select(.model, AICc, AIC, BIC) |>
  kable(align = "c")
```

Looking through the AICc scores, the log transformations seem to result in the best models

```{r arima-validation}
#| message: false
#| warning: false
arima_fc <- arima_fit |>
  forecast(new_data = arima_valid)

arima_fc |>
  accuracy(arima_valid) |>
  select(.model, RMSE, MAE, ME) |>
  arrange(RMSE) |>
  kable(digits = 4)
```

The model including features for volcanic activity and non-lagged variables for CO2 and ENSO is the best overall by any metric. **include a more robust explanation of why a delay in the enso index and why co2 radiative forcing on it's own aren't strong signals**

```{r initial-arima-report}
#| message: false
#| warning: false

# spot check models
arima_fit |>
  select(arima_volcano_basic) |>
  report()

arima_fit |>
  select(arima_volcano_basic) |>
  ggtime::gg_tsresiduals(plot_type = "partial")
```

The arima_volcano_basic model selected AR(1), MA(1) and sMA(2) with a seasonal difference. The ACF and PACF plots still contain spikes at lag 3. Interestingly, the arima_physics robust model contains a spike at lag 24 but not 3. There might be something in delaying the El Nino effect by a few months that explains the autocorrelation

```{r arima-param-check}

# parameter check
arima_fit |>
  tidy() |>
  group_by(term) |>
  summarize(across(.cols = c(estimate:`p.value`), .fns = mean)) |>
  arrange(p.value) |>
  kable(digits = 4)
```

Not necessarily the best way to ascertain how important variables are, but informative nonetheless. ENSO related variables and seasonal ARIMA parameters seem to be the most important to modeling this type of data

Now we'll check the best models from above as well as a few other promising predictor combinations across 5 CV splits and see how they hold up

```{r arima-second-fit}

arima_second_fit <- cv_trn |>
  model(
    arima_enso_log_co2 = ARIMA(log(actual_temp) ~ ENSO + co2_ppm + PDQ(D = 1)),
    arima_auto_10 = ARIMA(log(actual_temp) ~ ENSO + co2_lag10 + PDQ(D = 1)),
    arima_physics_robust = ARIMA(log(actual_temp) ~ rf_co2 + volcano_linear + enso_delay + PDQ(D = 1)),
    arima_forcing = ARIMA(log(actual_temp) ~ rf_co2 + enso_smooth_6 + PDQ(D = 1)),
    arima_auto_10_enso_6 = ARIMA(log(actual_temp) ~ enso_smooth_6 + co2_lag10 + PDQ(D = 1)),
    arima_robust_enso_6 = ARIMA(log(actual_temp) ~ rf_co2 + volcano_linear + enso_smooth_6 + PDQ(D = 1)),
    arima_co2_10_volcano = ARIMA(log(actual_temp) ~ co2_lag10 + volcano_linear + PDQ(D = 1)),
    arima_co2_3_nino = ARIMA(actual_temp ~ co2_lag3 + el_nino + la_nina + PDQ(D = 1)),
    arima_co2_3_nino_tsi = ARIMA(actual_temp ~ co2_lag3 + el_nino + la_nina + TSI + PDQ(D = 1)),
    arima_physics_robust_decay = ARIMA(log(actual_temp) ~ rf_co2 + volcano_forcing + enso_delay + PDQ(D = 1)),
    arima_physics_robust_decay_enso_3 = ARIMA(log(actual_temp) ~ rf_co2 + volcano_forcing + enso_smooth_3 + PDQ(D = 1)),
    arima_lag_3_enso = ARIMA(log(actual_temp) ~ co2_lag3 + ENSO + PDQ(D=1)),
    arima_physics_enso_3 = ARIMA(log(actual_temp) ~ rf_co2 + enso_smooth_3 + PDQ(D=1))
  )


arima_second_fc <- arima_second_fit |>
  forecast(new_data = cv_valid)

arima_second_fc |>
  accuracy(cv_valid) |>
  select(.model, RMSE, ME, MAE) |>
  group_by(.model) |>
  summarize(across(.cols = c(RMSE, MAE, ME), .fns = mean)) |>
  arrange(RMSE) |>
  kable(digits = 4)

```

Across multiple folds, the forcing models and models that smooth out the enso index and/or utilize a lagged CO2 concentration variable perform the best. All the results are fairly close and would probably trade places over different subsets of the data so choosing the best one will require consideration of other factors

```{r arima-glance}

arima_second_fit |>
  glance() |>
  select(.model, AICc, AIC, BIC) |>
  group_by(.model) |>
  summarize(across(.cols = c(AICc, AIC, BIC), .fns = mean)) |>
  arrange(AICc) |>
  kable(align = "c")
```

The top model by AICc was second best metric wise. It's also clear from the bottom models that a log transformation of the response variable drastically improves the models

```{r residual-check}

arima_second_fit |>
  select(arima_physics_robust_decay_enso_3) |>
  tail(n = 1) |>
  ggtime::gg_tsresiduals(plot_type = "partial")

arima_second_fit |>
  select(arima_physics_robust_decay_enso_3) |>
  tail(n = 1) |>
  augment() |>
  features(.innov, ljung_box, lag = 24) |>
  kable(digits = 3, align = "c")

```

No autocorrelation spikes in the residuals and the Ljung-Box test confirms that the residuals resemble white noise for the arima_physics_robust_decay_enso_3 model. Most other models have a spike at lag 3 or lag 24. The only other top model without any autocorrelation is arima_physics_enso_3 so a 3 month smoothing of the ENSO index seems to take care of it

```{r arima_cv_visual}
#| fig-width: 10

arima_second_fc |>
  filter(.id %in% c(4, 5), .model %in% c("arima_physics_robust_decay_enso_3", "arima_enso_log_co2", "arima_auto_10_enso_6")) |>
  autoplot() +
  autolayer(predictor_modeling |> filter(year(Date) >= 2015), actual_temp) +
  facet_wrap(~ .model, ncol = 1) +
  theme_minimal()
```

Retrain the top models on the same subset of data and compare

```{r final-fit-comparison}

final_fit <- cv_trn |>
  model(
    ets_auto = ETS(actual_temp),
    ets_additive = ETS(actual_temp ~ error("A") + trend("A") + season("A")),
    ets_box_auto = ETS(box_cox(actual_temp, lambda = 1.5)),
    tslm_no_trend = TSLM(box_cox(actual_temp, lambda = 1.5) ~ season() + enso_smooth_12 + co2_lag10),
    tslm_volcanic = TSLM(box_cox(actual_temp, lambda = 1.5) ~ season() + enso_smooth_12 + co2_lag10 + volcano_forcing),
    tslm_enso_smooth_lag = TSLM(box_cox(actual_temp, lambda = 1.5) ~ trend() + season() + enso_smooth_6 + co2_lag10),
    arima_auto_10 = ARIMA(log(actual_temp) ~ ENSO + co2_lag10 + PDQ(D = 1)),
    arima_physics_robust_decay_enso_3 = ARIMA(log(actual_temp) ~ rf_co2 + volcano_forcing + enso_smooth_3 + PDQ(D = 1)),
    arima_physics_enso_3 = ARIMA(log(actual_temp) ~ rf_co2 + enso_smooth_3 + PDQ(D = 1)),
    arima_enso_log_co2 = ARIMA(log(actual_temp) ~ ENSO + co2_ppm + PDQ(D = 1)),
  )

final_fc <- final_fit |>
  forecast(new_data = cv_valid)

final_fc |>
  accuracy(cv_valid) |>