Skip to content

AnnData file integration user stories #143

@dosumis

Description

@dosumis

Stories:

Story 1. As a taxonomy editor, I want my taxonomy to be linked to the cells being annotated in order that my annotations can be synchronised with the a representation of data about these cells, e.g. in a cell by gene matrix in the form of a file (e.g. AnnData) or a matrix store DB. I want this in order to be able to update the AnnData file for use in analysis which will inform edits to the Taxonomy. Without a robust system of linking to Cell IDs, we are relying on names and/or cluster IDs to link the two. There is a serious danger that name or ID changes will break these links.
TDT solutions:

  • CAS stores cell IDs for clusters. A link to an H5AD file supports initial population of these IDs to a taxonomy seeded from a spreadsheet.
  • TDT supports testing of a linked AnnData file to see if annotations are in sync with a taxonomy and can be safely updatable.
  • TDT support updating of cell annotations in an AnnData file from a linked taxonomy.

Story 2. As a taxonomy editor planning to publish an AnnData file to CZ CELLxGENE I want to generate an AnnData file for submission to CellXGene that is synchronised with the latest release of my taxonomy (resolvable via a Persistent URL) this means that:

  • annotations are synchronised with my taxonomy
  • appropriate cell ontology annotations are present in the cell_type field mandataed by CZ CELLxGENE
  • other details of my taxonomy are stored in the the AnnData file header.
  • (Note - we also need mechanisms to synchronise other metadata with the fields mandatated by CZ CELLxGENE, but is probably outside the scope of TDT)

Story 3: As a taxonomy editor I wish to edit or validate a list of marker genes in TDT, ensuring that they are in-sync with the genes in a reference Cell By Gene Matrix.
TDT support requires:

  • Edit form list support #38
  • Populate gene reference table from Linked CxG file.
  • Addition of support for validating gene lists and generating reports - for any column with data-type gene list
  • Autosuggest for gene list fields.

Tasks

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status
    EPICS

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions