Handling large number of labels (like >20) for a certain property in visualization and loss evaluation

**Input configuration**

Vanillix on TCGA data in this config:
[RUN_006001_config.yaml.txt](https://github.com/user-attachments/files/20944688/RUN_006001_config.yaml.txt)

**Handling large numbers of labels in Visualization**

In Latent space 2D UMAPs it is hard to assocate the colored classes with their actual class labels, when the number of label classes gets like more than 20.

See these examples:
latent2D_AJCC_PATHOLOGIC_TUMOR_STAGE.png
![Image](https://github.com/user-attachments/assets/b59634a7-3aaa-4549-9fa9-60e8007e7d2f)

latent2D_CANCER_TYPE.png
![Image](https://github.com/user-attachments/assets/9a32e076-f3bb-4ddb-8104-cd218488d53e)

latent2D_PATH_N_STAGE.png
![Image](https://github.com/user-attachments/assets/966509d1-5355-427a-af8d-2b0415f4c25c)

**Suggestion:** 
* Attach a small text label to each centroid star for this particular class.
* Experiment with other color maps, that still work on more than 20 classes, like 100 classes.
 
* Allow a user to specify hierarchical label classes with hierarchical color maps along this hierarchy.

For example for AJCC_PATHOLOGIC_TUMOR_STAGE 
STAGE 0 - greenish  (least severe)
STAGE I - green-yellow
STAGE II - yellow
STAGE III - red
STAGE IV - dark red (most severe)
(or a blue-yellow scale for red-green-blind users)

Inside the stages., the sub stages then could use graded colors of their "stage color". It is no longer super relevant, to be able to differentiate between them visually.

**Handling large numbers of labels in loss evaluation**

Such a domain-specific hierarchy can also be introduced for cancer subtypes. This domain specific hierarchy should also be used when **evaluating loss like the classification error**. An error within the same stage (like predicted:  AJCC_PATHOLOGIC_TUMOR_STAGE Ib, actual AJCC_PATHOLOGIC_TUMOR_STAGE Ia) is by far not as severe as an eror like predicted AJCC_PATHOLOGIC_TUMOR_STAG Ib, but actual AJCC_PATHOLOGIC_TUMOR_STAGE IVa.

As soon as there are more than like 20 class labels for a certain property/target variable/clinical variable, these class labels are typically arranged in a hierarchy. These hierachies on labels can be specified in a JSON or YAML file as input, eventually also with a coloring proposal.

An "unknown" class label should generally be visualized in somethink like light gray.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handling large number of labels (like >20) for a certain property in visualization and loss evaluation #24

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Handling large number of labels (like >20) for a certain property in visualization and loss evaluation #24

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions