Review logic & use cases for reconciling CAS and Anndata files.

TDT Workflows to support.

1. Starting point = spreadsheet + Anndata file + config.  Workflow steps:
   - Ingest spreadsheet to generate CAS w/o cell_ids
   - run checks to test before populating cell_ids - check labels and values of all labelsets in config are in CAS and AnnData.  
      - If cluster (rank 0) not in Anndata - fail.   
      - If other labelsets in config are present in Anndata - checks  keys and values match & warn if not.  If they are not present.  Pass.
   - populate cell_ids
   - If labelsets other than cluster are present - check that the hierarchy matches (compare cell sets for all).  If they don't match. warn.
   - If everything passes to this point 
        - record validation status in schema?
        - generate checksum for whole AnnData file and store.
   - Editing --> save changes to AnnData - no further checks are needed.  BUT Save changes to AnnData should flush and replace labelset (Flush = remove column in dataframe).  Warning should make this clear.
   - ~Generate checksum for whole AnnData file for later use in validation.~
2. Starting point: existing TDT repo with config, CAS file + cell_ids and linked AnnData file.  
   - Run checks in 1 on startup ().
 3. Starting point: CAS files with cell_ids shared by PURL.
    - Download AnnData (over-ridden if in local)

Review CAS tools functionality in light of these workflows.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Review logic & use cases for reconciling CAS and Anndata files. #19

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Review logic & use cases for reconciling CAS and Anndata files. #19

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions