📋 Description
@kinari-labwork found cases in TSUMUGI where significant == true but effect_size == 0.
For example:
zcat datas/genewise_phenotype_annotations.jsonl.gz | grep 'increased circulating creatinine level' | head -n 2
{"mp_term_name": "increased circulating creatinine level", "marker_symbol": "1500009L16Rik", "sexual_dimorphism": "None", "mp_term_id": "MP:0005553", "life_stage": "Early", "effect_size": 0.0, "zygosity": "Homo", "marker_accession_id": "MGI:1917034", "disease_annotation": [], "significant": true}
{"mp_term_name": "increased circulating creatinine level", "marker_symbol": "AI467606", "sexual_dimorphism": "None", "mp_term_id": "MP:0005553", "life_stage": "Early", "effect_size": 43.5261880802177, "zygosity": "Homo", "marker_accession_id": "MGI:2141979", "disease_annotation": [], "significant": true}
The first record is an example where significant == true but effect_size == 0.
The second record has the same phenotype, is also significant == true, and has a non-zero effect_size.
After checking the raw IMPC data, it seems that the issue is caused by missing effect_size values in the source data.
- For
1500009L16Rik), the IMPC statistical result does not contain effect_size.
- For
AI467606, the IMPC statistical result does contain effect_size.
At the moment, TSUMUGI appears to convert missing effect_size values to 0.
This may be misleading, because 0 suggests a true zero effect rather than a missing value.
Suggested fix:
- Treat missing
effect_size values as float("nan") instead of 0.
🔖 TSUMUGI Version
1.0.2
📎 Anything else?
No response
📋 Description
@kinari-labwork found cases in TSUMUGI where
significant == truebuteffect_size == 0.For example:
{"mp_term_name": "increased circulating creatinine level", "marker_symbol": "1500009L16Rik", "sexual_dimorphism": "None", "mp_term_id": "MP:0005553", "life_stage": "Early", "effect_size": 0.0, "zygosity": "Homo", "marker_accession_id": "MGI:1917034", "disease_annotation": [], "significant": true} {"mp_term_name": "increased circulating creatinine level", "marker_symbol": "AI467606", "sexual_dimorphism": "None", "mp_term_id": "MP:0005553", "life_stage": "Early", "effect_size": 43.5261880802177, "zygosity": "Homo", "marker_accession_id": "MGI:2141979", "disease_annotation": [], "significant": true}The first record is an example where
significant == truebuteffect_size == 0.The second record has the same phenotype, is also
significant == true, and has a non-zeroeffect_size.After checking the raw IMPC data, it seems that the issue is caused by missing
effect_sizevalues in the source data.1500009L16Rik), the IMPC statistical result does not containeffect_size.AI467606, the IMPC statistical result does containeffect_size.At the moment, TSUMUGI appears to convert missing
effect_sizevalues to0.This may be misleading, because
0suggests a true zero effect rather than a missing value.Suggested fix:
effect_sizevalues asfloat("nan")instead of0.🔖 TSUMUGI Version
1.0.2
📎 Anything else?
No response