Skip to content

Improve the error message when build_multivariate_dataframe has the list of stat_vars more than the batch_size #184

@sharadshriram

Description

@sharadshriram

cc: @shifucun

I was using a script to build_multivariate_dataframe for a stat_var list of length more than 50 and got the following error:

Traceback (most recent call last):
  File "/home/sharadshriram/accessible_charts/datasets/datacommons/get_data.py", line 88, in <module>
    save_statvar_to_csv(place, 'data.csv')
  File "/home/sharadshriram/accessible_charts/datasets/datacommons/get_data.py", line 67, in save_statvar_to_csv
    df = dpd.build_multivariate_dataframe([place], stat_vars)
  File "/home/sharadshriram/env/lib/python3.10/site-packages/datacommons_pandas/df_builder.py", line 314, in build_multivariate_dataframe
    df = pd.DataFrame.from_records(_multivariate_pd_input(places, stat_vars))
  File "/home/sharadshriram/env/lib/python3.10/site-packages/datacommons_pandas/df_builder.py", line 238, in _multivariate_pd_input
    rows_dict = _group_stat_all_by_obs_options(places,
  File "/home/sharadshriram/env/lib/python3.10/site-packages/datacommons_pandas/df_builder.py", line 88, in _group_stat_all_by_obs_options
    stat_all = dc.get_stat_all(places, stat_vars)
  File "/home/sharadshriram/env/lib/python3.10/site-packages/datacommons_pandas/stat_vars.py", line 226, in get_stat_all
    batches = -(-len(places) // places_per_batch)
ZeroDivisionError: integer division or modulo by zero

However, ZeroDivisionError: integer division or modulo by zero did not help me understand what caused the ZeroDivisionError. After backtracking, I observed the error was caused not because of batching, but because the len(stat_var) passed to dc.get_stat_all(places, stat_vars) was greater than 50.

Is it possible for the error message to read out that the length of stat_var list passed is more than the batch_size limit of 50?

I also wonder whether, we can extend the get_stat_all() method to chunk long lists of stat_var to length 50, and do the API query. Would like to hear your thoughts?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions