Skip to content

Review logic & use cases for reconciling CAS and Anndata files. #19

@dosumis

Description

@dosumis

TDT Workflows to support.

  1. Starting point = spreadsheet + Anndata file + config. Workflow steps:
    • Ingest spreadsheet to generate CAS w/o cell_ids
    • run checks to test before populating cell_ids - check labels and values of all labelsets in config are in CAS and AnnData.
      • If cluster (rank 0) not in Anndata - fail.
      • If other labelsets in config are present in Anndata - checks keys and values match & warn if not. If they are not present. Pass.
    • populate cell_ids
    • If labelsets other than cluster are present - check that the hierarchy matches (compare cell sets for all). If they don't match. warn.
    • If everything passes to this point
      • record validation status in schema?
      • generate checksum for whole AnnData file and store.
    • Editing --> save changes to AnnData - no further checks are needed. BUT Save changes to AnnData should flush and replace labelset (Flush = remove column in dataframe). Warning should make this clear.
    • Generate checksum for whole AnnData file for later use in validation.
  2. Starting point: existing TDT repo with config, CAS file + cell_ids and linked AnnData file.
    • Run checks in 1 on startup ().
  3. Starting point: CAS files with cell_ids shared by PURL.
    • Download AnnData (over-ridden if in local)

Review CAS tools functionality in light of these workflows.

Metadata

Metadata

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions