Skip to content

messytables guesses wrong type for decimal number #190

@wrinklenose

Description

@wrinklenose

Describe the bug
Messytables should guess decimals correctly respecting the locale configuration.
For example: In germany the , is used as decimal dot but a value 1,200 is guessed as type "text".

This issue was initially reported as ckan issue ckan/ckan#5769 where I recognized it.

The type guessing seems to happen here: https://github.com/okfn/messytables/blob/51b736892a48e420ab313675f54901c77b446dec/messytables/types.py
and seems to happen locale specific. (I think the magic happens in line 100:
value = locale.atof(value)

Unfortunately python seems to recognizes a dot as decimal point even if a german locale is set, which I could reproduce in my local environment:

>>> locale.getlocale()
('de_DE', 'cp1252')
>>> locale.atof('1,200')

Traceback (most recent call last):
  File "<pyshell#35>", line 1, in <module>
    locale.atof('1,200')
  File "C:\Program Files\Python27\lib\locale.py", line 318, in atof
    return func(string)
ValueError: invalid literal for float(): 1,200
>>> locale.localeconv()
{'mon_decimal_point': '', 'int_frac_digits': 127, 'p_sep_by_space': 127, 'frac_digits': 127, 'thousands_sep': '', 'n_sign_posn': 127, 'decimal_point': '.', 'int_curr_symbol': '', 'n_cs_precedes': 127, 'p_sign_posn': 127, 'mon_thousands_sep': '', 'negative_sign': '', 'currency_symbol': '', 'n_sep_by_space': 127, 'mon_grouping': [], 'p_cs_precedes': 127, 'positive_sign': '', 'grouping': []}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions