Skip to content

design_info is needed to make predictions on new data #121

@samuelefiorini

Description

@samuelefiorini

According to the documentation

The design info is not useful in general (to reproduce results, make predictions), and not dumping it will save a lot of time.

However trying to make predictions on new data using a model restored via forecaster.load_forecast_result(path, load_design_info=False) leads to the following error.

File /var/lang/lib/python3.10/site-packages/greykite/algo/common/ml_models.py:715, in predict_ml(fut_df, trained_model)
    713 y_col = trained_model["y_col"]
    714 ml_model = trained_model["ml_model"]
--> 715 x_design_info = trained_model["x_design_info"]
    716 drop_intercept_col = trained_model["drop_intercept_col"]
    717 min_admissible_value = trained_model["min_admissible_value"]

KeyError: 'x_design_info'

This totally makes sense when looking at greykite.algo.common.ml_models.predict_ml as the variable x_design_info is used by patsy to build the design matrix (see here).

On the other hand, dumping design_info does not only imply dealing with a bigger artifact, but may be impossible due to system limitations on the generated filename.

As an example, this is what happens in my case.

OSError: [Errno 36] File name too long: '/opt/ml/model/5f4cafc99b894af398c02013e13348e2/artifacts/forecast_result/grid_search/best_estimator_/steps/2_key/1_key/model_dict/x_design_info__value__/factor_infos/EvalFactor("C(Q(\'dow_hr\'), levels=[\'1_00\', \'1_01\', \'1_02\', \'1_03\', \'1_04\', \'1_05\', \'1_06\', \'1_07\', \'1_08\', \'1_09\', \'1_10\', \'1_11\', \'1_12\', \'1_13\', \'1_14\', \'1_15\', \'1_16\', \'1_17\', \'1_18\', \'1_19\', \'1_20\', \'1_21\', \'1_22\', \'1_23\', \'2_00\', \'2_01\', \'2_02\', \'2_03\', \'2_04\', \'2_05\', \'2_06\', \'2_07\', \'2_08\', \'2_09\', \'2_10\', \'2_11\', \'2_12\', \'2_13\', \'2_14\', \'2_15\', \'2_16\', \'2_17\', \'2_18\', \'2_19\', \'2_20\', \'2_21\', \'2_22\', \'2_23\', \'3_00\', \'3_01\', \'3_02\', \'3_03\', \'3_04\', \'3_05\', \'3_06\', \'3_07\', \'3_08\', \'3_09\', \'3_10\', \'3_11\', \'3_12\', \'3_13\', \'3_14\', \'3_15\', \'3_16\', \'3_17\', \'3_18\', \'3_19\', \'3_20\', \'3_21\', \'3_22\', \'3_23\', \'4_00\', \'4_01\', \'4_02\', \'4_03\', \'4_04\', \'4_05\', \'4_06\', \'4_07\', \'4_08\', \'4_09\', \'4_10\', \'4_11\', \'4_12\', \'4_13\', \'4_14\', \'4_15\', \'4_16\', \'4_17\', \'4_18\', \'4_19\', \'4_20\', \'4_21\', \'4_22\', \'4_23\', \'5_00\', \'5_01\', \'5_02\', \'5_03\', \'5_04\', \'5_05\', \'5_06\', \'5_07\', \'5_08\', \'5_09\', \'5_10\', \'5_11\', \'5_12\', \'5_13\', \'5_14\', \'5_15\', \'5_16\', \'5_17\', \'5_18\', \'5_19\', \'5_20\', \'5_21\', \'5_22\', \'5_23\', \'6_00\', \'6_01\', \'6_02\', \'6_03\', \'6_04\', \'6_05\', \'6_06\', \'6_07\', \'6_08\', \'6_09\', \'6_10\', \'6_11\', \'6_12\', \'6_13\', \'6_14\', \'6_15\', \'6_16\', \'6_17\', \'6_18\', \'6_19\', \'6_20\', \'6_21\', \'6_22\', \'6_23\', \'7_00\', \'7_01\', \'7_02\', \'7_03\', \'7_04\', \'7_05\', \'7_06\', \'7_07\', \'7_08\', \'7_09\', \'7_10\', \'7_11\', \'7_12\', \'7_13\', \'7_14\', \'7_15\', \'7_16\', \'7_17\', \'7_18\', \'7_19\', \'7_20\', \'7_21\', \'7_22\', \'7_23\'])")__key__.pkl'

Any ideas on how to work around this issue?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions