-
Notifications
You must be signed in to change notification settings - Fork 3
Description
Input configuration
Vanillix on TCGA data in this config:
RUN_006001_config.yaml.txt
Handling large numbers of labels in Visualization
In Latent space 2D UMAPs it is hard to assocate the colored classes with their actual class labels, when the number of label classes gets like more than 20.
See these examples:
latent2D_AJCC_PATHOLOGIC_TUMOR_STAGE.png

Suggestion:
-
Attach a small text label to each centroid star for this particular class.
-
Experiment with other color maps, that still work on more than 20 classes, like 100 classes.
-
Allow a user to specify hierarchical label classes with hierarchical color maps along this hierarchy.
For example for AJCC_PATHOLOGIC_TUMOR_STAGE
STAGE 0 - greenish (least severe)
STAGE I - green-yellow
STAGE II - yellow
STAGE III - red
STAGE IV - dark red (most severe)
(or a blue-yellow scale for red-green-blind users)
Inside the stages., the sub stages then could use graded colors of their "stage color". It is no longer super relevant, to be able to differentiate between them visually.
Handling large numbers of labels in loss evaluation
Such a domain-specific hierarchy can also be introduced for cancer subtypes. This domain specific hierarchy should also be used when evaluating loss like the classification error. An error within the same stage (like predicted: AJCC_PATHOLOGIC_TUMOR_STAGE Ib, actual AJCC_PATHOLOGIC_TUMOR_STAGE Ia) is by far not as severe as an eror like predicted AJCC_PATHOLOGIC_TUMOR_STAG Ib, but actual AJCC_PATHOLOGIC_TUMOR_STAGE IVa.
As soon as there are more than like 20 class labels for a certain property/target variable/clinical variable, these class labels are typically arranged in a hierarchy. These hierachies on labels can be specified in a JSON or YAML file as input, eventually also with a coloring proposal.
An "unknown" class label should generally be visualized in somethink like light gray.

