Skip to content

Handling large number of labels (like >20) for a certain property in visualization and loss evaluation #24

@ralf-koenig

Description

@ralf-koenig

Input configuration

Vanillix on TCGA data in this config:
RUN_006001_config.yaml.txt

Handling large numbers of labels in Visualization

In Latent space 2D UMAPs it is hard to assocate the colored classes with their actual class labels, when the number of label classes gets like more than 20.

See these examples:
latent2D_AJCC_PATHOLOGIC_TUMOR_STAGE.png
Image

latent2D_CANCER_TYPE.png
Image

latent2D_PATH_N_STAGE.png
Image

Suggestion:

  • Attach a small text label to each centroid star for this particular class.

  • Experiment with other color maps, that still work on more than 20 classes, like 100 classes.

  • Allow a user to specify hierarchical label classes with hierarchical color maps along this hierarchy.

For example for AJCC_PATHOLOGIC_TUMOR_STAGE
STAGE 0 - greenish (least severe)
STAGE I - green-yellow
STAGE II - yellow
STAGE III - red
STAGE IV - dark red (most severe)
(or a blue-yellow scale for red-green-blind users)

Inside the stages., the sub stages then could use graded colors of their "stage color". It is no longer super relevant, to be able to differentiate between them visually.

Handling large numbers of labels in loss evaluation

Such a domain-specific hierarchy can also be introduced for cancer subtypes. This domain specific hierarchy should also be used when evaluating loss like the classification error. An error within the same stage (like predicted: AJCC_PATHOLOGIC_TUMOR_STAGE Ib, actual AJCC_PATHOLOGIC_TUMOR_STAGE Ia) is by far not as severe as an eror like predicted AJCC_PATHOLOGIC_TUMOR_STAG Ib, but actual AJCC_PATHOLOGIC_TUMOR_STAGE IVa.

As soon as there are more than like 20 class labels for a certain property/target variable/clinical variable, these class labels are typically arranged in a hierarchy. These hierachies on labels can be specified in a JSON or YAML file as input, eventually also with a coloring proposal.

An "unknown" class label should generally be visualized in somethink like light gray.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions