Skip to content

Handle missing IMPC effect_size values as NaN instead of 0 in TSUMUGI records #165

@akikuno

Description

@akikuno

📋 Description

@kinari-labwork found cases in TSUMUGI where significant == true but effect_size == 0.

For example:

zcat datas/genewise_phenotype_annotations.jsonl.gz | grep 'increased circulating creatinine level' | head -n 2
{"mp_term_name": "increased circulating creatinine level", "marker_symbol": "1500009L16Rik", "sexual_dimorphism": "None", "mp_term_id": "MP:0005553", "life_stage": "Early", "effect_size": 0.0, "zygosity": "Homo", "marker_accession_id": "MGI:1917034", "disease_annotation": [], "significant": true}

{"mp_term_name": "increased circulating creatinine level", "marker_symbol": "AI467606", "sexual_dimorphism": "None", "mp_term_id": "MP:0005553", "life_stage": "Early", "effect_size": 43.5261880802177, "zygosity": "Homo", "marker_accession_id": "MGI:2141979", "disease_annotation": [], "significant": true}

The first record is an example where significant == true but effect_size == 0.
The second record has the same phenotype, is also significant == true, and has a non-zero effect_size.

After checking the raw IMPC data, it seems that the issue is caused by missing effect_size values in the source data.

  • For 1500009L16Rik), the IMPC statistical result does not contain effect_size.
  • For AI467606, the IMPC statistical result does contain effect_size.

At the moment, TSUMUGI appears to convert missing effect_size values to 0.
This may be misleading, because 0 suggests a true zero effect rather than a missing value.

Suggested fix:

  • Treat missing effect_size values as float("nan") instead of 0.

🔖 TSUMUGI Version

1.0.2

📎 Anything else?

No response

Metadata

Metadata

Assignees

Labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions