No seasonal terms are included with seasonality options set to 'auto' with monthly data

Greykite documentation states that the seasonality "auto" option is meant to let the template decide, based on input data frequency and the amount of training data, whether to model that seasonality with default Fourier order:
https://linkedin.github.io/greykite/docs/0.1.0/html/pages/model_components/0300_seasonality.html?highlight=seasonality

However, with **monthly** data, this option always defaults to False, both for `QUARTERLY_SEASONALITY` and `YEARLY_SEASONALITY`, even when the amount of training data (`num_training_days`) is greater than the minimum required (`default_min_days`). Why ? Read below.

These are the Silverkite default settings for minimum training data requirements, as defined in _\greykite\algo\forecast\silverkite\constants\silverkite_seasonality.py_
```
SilverkiteSeasonality(name='ct1', period=1.0, order=15, seas_names='yearly', default_min_days=548)
SilverkiteSeasonality(name='toq', period=1.0, order=5, seas_names='quarterly', default_min_days=180)
```

`num_training_days` is calculated in _\greykite\common\time_properties_forecast.py_, whereas the actual test is in _\greykite\algo\forecast\silverkite\forecast_simple_silverkite.py_(here, `num_days` is `num_training_days` calculated above):

```
num_days >= seas.value.default_min_days
                    and seas.name in freq_auto_seas_names
```

The result of the test is always False for monthly data, because `freq_auto_seas_names` is an empty dictionary, hence the condition `seas.name in freq_auto_seas_names` is never met ; the reason can be clearly seen in _\greykite\algo\forecast\silverkite\constants\silverkite_time_frequency.py_, where, e.g., for weekly data `freq_auto_seas_names` is the following dictionary:
```
auto_fourier_seas={SeasonalityEnum.MONTHLY_SEASONALITY.name,
                           SeasonalityEnum.QUARTERLY_SEASONALITY.name,
                           SeasonalityEnum.YEARLY_SEASONALITY.name})
```
whereas for monthly, quarterly and yearly data `freq_auto_seas_names = {}`, e.g. for monthly data:
```
auto_fourier_seas={
            # QUARTERLY_SEASONALITY and YEARLY_SEASONALITY are excluded from defaults
            # It's better to use `C(month)` as a categorical feature indicating the month
        })
```
Therefore, **based on input data frequency** in the first line of this issue really means: if the data frequency is one of MINUTE, HOUR, DAY, WEEK, excluding MONTH, QUARTER, YEAR, MULTIYEAR.

The "better" option in _\greykite\algo\forecast\silverkite\constants\silverkite_time_frequency.py_ when using monthly data is thus to add an extra `C(month)` column as a categorical feature indicating the month.

Question: Why is this a "better" option than the following definition ?
```
auto_fourier_seas={SeasonalityEnum.QUARTERLY_SEASONALITY.name,
                           SeasonalityEnum.YEARLY_SEASONALITY.name})
```

I see the following alternatives when dealing with monthly data:

1. add an extra `C(month)` column as a categorical feature indicating the month; this has the disadvantage that the extra column should only be added when both `QUARTERLY_SEASONALITY` and `YEARLY_SEASONALITY` options are set to "auto" and not to "True" or "False" (quarterly and/or yearly seasonality terms are added automatically by Greykite when the respective option is set to "True", according to the `valid_seas` dictionary defined in _\greykite\common\enums.py; while the term in question is not added when "False")
2. Add `QUARTERLY_SEASONALITY` and `YEARLY_SEASONALITY` terms (currently excluded from defaults) to the empty `auto_fouries_seas` dictionary; but Greykite developers seem to prefer option 1.
3. Forget about the user setting the seasonality options ("auto", "True", "False") manually - this is applicable to all input data frequencies, not just monthly:

- [ ] Let the user configure the Fourier order and the minimum number of cycles for each seasonality
- [ ] Set the corresponding seasonality option to either "True" or "False" automatically, according to principles learned from the current logic, i.e., input data frequency, `valid_seas` and  `num_training_points >= default_min_points`

One may argue that `num_training_points` varies between training sets when using CV splits; however, the following example shows that both `num_training_points` and `num_training_days` are invariant between splits, even with `cv_expanding_window =True`:
```
[CV 1/3] ... valid_seas={'YEARLY_SEASONALITY', 'QUARTERLY_SEASONALITY'})>, 'num_training_points': 26, 'num_training_days': 789.0, 'days_per_observation': 28.0, ...
[CV 2/3] ... valid_seas={'YEARLY_SEASONALITY', 'QUARTERLY_SEASONALITY'})>, 'num_training_points': 26, 'num_training_days': 789.0, 'days_per_observation': 28.0, ...
[CV 3/3] ... valid_seas={'YEARLY_SEASONALITY', 'QUARTERLY_SEASONALITY'})>, 'num_training_points': 26, 'num_training_days': 789.0, 'days_per_observation': 28.0, ...
```
This means that the test `num_training_points >= default_min_points` can be applied only once directly from `train_end_date` before entering the CV loop (the current `Fitting 3 folds for each of 1 candidates, totalling 3 fits` section apparently tests the seasonality terms at each split, but the test values are invariant, as mentioned above).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No seasonal terms are included with seasonality options set to 'auto' with monthly data #77

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

No seasonal terms are included with seasonality options set to 'auto' with monthly data #77

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions