py-hubmap-inventory

Python package that creates an inventory for HuBMAP datasets.

The inventory is composed of three files

a TSV with all file level features
a JSON file with basic metadata information and file manifest
a compressed JSON file

Before you continue...

Read this

this package needs access to the file system.
protected and public published datasets can be processed on HIVE by the hive user
public published datasets can be processed on Bridges2 by any user (data is public)
there is a bottleneck associated with the maximum number of files that can be processed at once. The magic number is ncores = 25.

About the JSON file

The JSON file is a dictionary style structure with dataset and file level information. The keys of this dictionary are

data_type - CODEX, AF, etc.
directory - directory path on Hive
doi_url - the DOI URL, if any
frequencies - frequencies of file extensions in this dataset. Useful for building histograms
hubmap_id - dataset HuBMAP ID
is_protected - True if protected, False otherwise
manifest - a dictionary with file level statistics for each file in this dataset
number_of_files
pretty_size - an easy to read string representing the size of the data directory
size - size in bytes of the data directory
status - Published, etc.
uuid - dataset UUID

The manifest key in the dictionary above is a list of dictionaries as well. Each dictionary has file level information about a file in the dataset. The list as a long as there are files in the dataset. The keys of each dictionary in the list are

download_url - Globus direct download URL. Does not apply for protected datasets.
extension
filename
filetype - image, sequence or other
fullpath
md5 - checksum
mime-type
modification_time
sha256 - checksum
size - size in bytes

Examples

See examples folder for Jupyter Notebooks and simple scripts.

Name		Name	Last commit message	Last commit date
Latest commit History 164 Commits
data		data
examples		examples
hubmapinventory		hubmapinventory
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

py-hubmap-inventory

Before you continue...

About the JSON file

Examples

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

py-hubmap-inventory

Before you continue...

About the JSON file

Examples

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages