TDT Workflows to support.
- Starting point = spreadsheet + Anndata file + config. Workflow steps:
- Ingest spreadsheet to generate CAS w/o cell_ids
- run checks to test before populating cell_ids - check labels and values of all labelsets in config are in CAS and AnnData.
- If cluster (rank 0) not in Anndata - fail.
- If other labelsets in config are present in Anndata - checks keys and values match & warn if not. If they are not present. Pass.
- populate cell_ids
- If labelsets other than cluster are present - check that the hierarchy matches (compare cell sets for all). If they don't match. warn.
- If everything passes to this point
- record validation status in schema?
- generate checksum for whole AnnData file and store.
- Editing --> save changes to AnnData - no further checks are needed. BUT Save changes to AnnData should flush and replace labelset (Flush = remove column in dataframe). Warning should make this clear.
Generate checksum for whole AnnData file for later use in validation.
- Starting point: existing TDT repo with config, CAS file + cell_ids and linked AnnData file.
- Run checks in 1 on startup ().
- Starting point: CAS files with cell_ids shared by PURL.
- Download AnnData (over-ridden if in local)
Review CAS tools functionality in light of these workflows.
TDT Workflows to support.
Generate checksum for whole AnnData file for later use in validation.Review CAS tools functionality in light of these workflows.