Skip to content

Latest commit

 

History

History
41 lines (27 loc) · 3.8 KB

File metadata and controls

41 lines (27 loc) · 3.8 KB

How to Add New Markers

To contribute new marker data, please submit a pull request with the appropriate files added to this folder. This will automatically trigger a set of GitHub Actions to validate the data and report any issues.

1. Prepare Your Input Data

Place your marker data file(s) in the src/markers/input directory. Each file must include the following required columns:

clusterName f_score NSForest_markers

You may include additional columns if needed. See NS-Forest SOP here.

2. Add Metadata

Alongside the input data, include a corresponding metadata entry in the src/markers/input/metadata.csv file. Each row should describe one input file and should include the following fields:

file_name Species Species_abbreviation Organ_region Parent Marker_set_xref CxG_collection CxG_dataset software_version cluster_header

Example metadata:

file_name Species Species_abbreviation Organ_region Parent Marker_set_xref CxG_collection CxG_dataset software_version cluster_header
HLCA_CellRef_MarkerPerformance_forDOS.csv NCBITaxon:9606 Human UBERON:0002048 SO:0001260 https://doi.org/10.5281/zenodo.11165918 https://cellxgene.cziscience.com/collections/6f6d381a-7701-4781-935c-db10d30de293 An integrated cell atlas of the human lung in health and disease (core)
nsforest_human_neocortex_global_cluster_combinatorial_results.csv NCBITaxon:9606 Human UBERON:0001950 SO:0001260 https://doi.org/10.5281/zenodo.11165918 https://cellxgene.cziscience.com/collections/d17249d2-0e6e-4500-abb8-e6c93fa1ac6f cluster
nsforest_human_neocortex_global_subclass_results.csv NCBITaxon:9606 Human UBERON:0001950 SO:0001260 https://doi.org/10.5281/zenodo.11165918 https://cellxgene.cziscience.com/collections/d17249d2-0e6e-4500-abb8-e6c93fa1ac6f subclass

Notes:

  • CxG_dataset and CxG_collection are optional. If provided, the pipeline will use them to query the CL_KG.
  • If CxG_dataset is omitted, the pipeline will default to the cxg_dataset_title in the input file.
  • software_version: NS-Forest version used.
  • cluster_header: The obs key used in the NS-Forest analysis. Usually, it refers to the annotation level.

3. GitHub Action: Validate Input

After adding your files and metadata, create a pull request. This will trigger an automated GitHub Action that validates the metadata and input files. The action will check for:

  • Correct column names and types in the input files.
  • Consistency between the input files and the metadata.

GitHub Action Validate Input