Python Library for Loading and Manipulating lyDATA Tables

This repository provides a Python library for loading, manipulating, and validating the datasets available on lyDATA.

Warning

This Python library is still highly experimental!

Also, it has recently been spun off from the repository of datasets, lyDATA, and some things might still not work as expected.

Installation

1. Install from PyPI

You can install the library from PyPI using pip:

pip install lydata

2. Install from Source

If you want to install the library from source, you can clone the repository and install it using pip:

git clone https://github.com/lycosystem/lydata-package
cd lydata-package
pip install -e .

Usage

The first and most common use case would probably listing and loading the published datasets:

>>> import lydata
>>> for dataset_spec in lydata.available_datasets(
...     year=2023,              # show all datasets added in 2023
...     ref="61a17e",           # may be some specific hash/tag/branch
... ):
...     print(dataset_spec.name)
2023-clb-multisite
2023-isb-multisite

# return generator of datasets that include oropharyngeal tumor patients
>>> first_dataset = next(lydata.load_datasets(subsite="oropharynx"))
>>> print(first_dataset.head())
... # doctest: +ELLIPSIS, +NORMALIZE_WHITESPACE
  patient                              ... positive_dissected
        #                              ...             contra
       id         institution     sex  ...                III   IV    V
0    P011  Centre Léon Bérard    male  ...                0.0  0.0  0.0
1    P012  Centre Léon Bérard  female  ...                0.0  0.0  0.0
2    P014  Centre Léon Bérard    male  ...                0.0  0.0  NaN
3    P015  Centre Léon Bérard    male  ...                0.0  0.0  NaN
4    P018  Centre Léon Bérard    male  ...                NaN  NaN  NaN
[5 rows x 82 columns]

And since the three-level header of the tables is a little unwieldy at times, we also provide some shortcodes via a custom pandas accessor. As soon as lydata is imported it can be used like this:

>>> print(first_dataset.ly.age)
... # doctest: +ELLIPSIS, +NORMALIZE_WHITESPACE
0      67
1      62
      ...
261    60
262    60
Name: (patient, #, age), Length: 263, dtype: int64

And we have implemented Q and C objects inspired by Django that allow easier querying of the tables:

>>> from lydata import C

# select patients younger than 50 that are not HPV positive (includes NaNs)
>>> query_result = first_dataset.ly.query((C("age") < 50) & ~(C("hpv") == True))
>>> (query_result.ly.age < 50).all()
np.True_
>>> (query_result.ly.hpv == False).all()
np.True_

For more details and further examples or use-cases, have a look at the official documentation

Name		Name	Last commit message	Last commit date
Latest commit History 132 Commits
.github/workflows		.github/workflows
docs		docs
src/lydata		src/lydata
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
.readthedocs.yml		.readthedocs.yml
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
conftest.py		conftest.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Python Library for Loading and Manipulating lyDATA Tables

Installation

1. Install from PyPI

2. Install from Source

Usage

About

Uh oh!

Releases 5

Languages

License

lycosystem/lydata-package

Folders and files

Latest commit

History

Repository files navigation

Python Library for Loading and Manipulating lyDATA Tables

Installation

1. Install from PyPI

2. Install from Source

Usage

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 5

Languages