Skip to content

Compare downloading data first vs. using remote datasets #39

@emlys

Description

@emlys

invest now supports passing in remote file paths, but I don't know if that is always the best option. If a user provides a datastack, there are a few ways the server could access the data:

  • User provides all data in a tar.gz archive, which the server downloads at the start of the job (currently supported)
  • User provides a JSON-only datastack containing remote paths; server downloads the files at the start of the job and replaces the remote paths with local paths
  • User provides a JSON-only datastack containing remote paths; server passes this on to invest, and gdal/pandas handle reading in the remote files (this should work now, but untested)

Compare performance and user experience of these options.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions