Skip to content

cf-checker fails to fetch data from CF Conventions webpage #122

@sfinkens

Description

@sfinkens

Currently cf-checker fails with the following error message:

cfchecks -v auto myfile.nc
CHECKING NetCDF FILE: myfile.nc
=====================
Traceback (most recent call last):
  File ".pixi/envs/dev/bin/cfchecks", line 10, in <module>
    sys.exit(main())
             ~~~~^^
  File ".pixi/envs/dev/lib/python3.13/site-packages/cfchecker/cfchecks.py", line 3402, in main
    inst.checker(file)
    ~~~~~~~~~~~~^^^^^^
  File ".pixi/envs/dev/lib/python3.13/site-packages/cfchecker/cfchecks.py", line 494, in checker
    parser.parse(self.standardNames)
    ~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^
  File ".pixi/envs/dev/lib/python3.13/xml/sax/expatreader.py", line 99, in parse
    source = saxutils.prepare_input_source(source)
  File ".pixi/envs/dev/lib/python3.13/xml/sax/saxutils.py", line 365, in prepare_input_source
    f = urllib.request.urlopen(source.getSystemId())
  File ".pixi/envs/dev/lib/python3.13/urllib/request.py", line 189, in urlopen
    return opener.open(url, data, timeout)
           ~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^
  File ".pixi/envs/dev/lib/python3.13/urllib/request.py", line 495, in open
    response = meth(req, response)
  File ".pixi/envs/dev/lib/python3.13/urllib/request.py", line 604, in http_response
    response = self.parent.error(
        'http', request, response, code, msg, hdrs)
  File ".pixi/envs/dev/lib/python3.13/urllib/request.py", line 533, in error
    return self._call_chain(*args)
           ~~~~~~~~~~~~~~~~^^^^^^^
  File ".pixi/envs/dev/lib/python3.13/urllib/request.py", line 466, in _call_chain
    result = func(*args)
  File ".pixi/envs/dev/lib/python3.13/urllib/request.py", line 613, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 504: Gateway Timeout

According to https://github.com/orgs/cf-convention/discussions/466 the http://cfconventions.org domain expired. But apparently that domain was just redirecting to https://cf-convention.github.io. So here's a quick workaround until the domain is re-activated. Actually it could make sense to permanently update the URLs in the source code.

import sys
from unittest import mock
import cfchecker.cfchecks


def cfchecks(args):
    def exit_mock(rc):
        if rc > 0:
            raise RuntimeError("CF checker failed with errors, see logging above")
    cfchecker.cfchecks.STANDARDNAME = "https://cf-convention.github.io/Data/cf-standard-names/current/src/cf-standard-name-table.xml"
    cfchecker.cfchecks.AREATYPES = 'http://cf-convention.github.io/Data/area-type-table/current/src/area-type-table.xml'
    cfchecker.cfchecks.REGIONNAMES = 'http://cf-convention.github.io/Data/standardized-region-list/standardized-region-list.xml'
    sys.argv = ["cfchecks"] + args
    with mock.patch("cfchecker.cfchecks.sys.exit", side_effect=exit_mock):
        cfchecker.cfchecks.main()


args = ["-v", "auto", "file1.nc", "file2.nc"]
cfchecks(args)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions