[Question] Ensemble approach using multiple runs from checkpoint files

# Background
I'm currently using scTab for cell type annotation on my research dataset with the checkpoint file cap-sctab-service-ckpts.tar.gz downloaded from [here](https://pklab.med.harvard.edu/felix/data/cap-sctab-service-ckpts.tar.gz).
# Current Approach
The checkpoint appears to contain 5 runs with different random seeds. I'm implementing an ensemble approach as follows:

1. Prediction aggregation: Use all 5 runs to predict cell types for each cell
2. Label assignment: Assign the most frequent (mode) predicted label across the 5 runs
3. Tie-breaking: If there's a tie in the mode calculation, compute the average prediction probability for each tied label and assign the label with higher average probability
4. Confidence scoring: Use the average probability from runs that predicted the assigned label as the final confidence score
5. Filtering: Plan to filter cells based on this confidence score

# Questions

1. Is this ensemble approach appropriate and recommended? Or would it be better to use a specific single run from the checkpoint?
2. Are there any best practices or recommendations for handling multiple runs in scTab checkpoints?
3. Is the tie-breaking method reasonable, or should I consider alternative approaches (e.g., using prediction entropy, maximum probability across runs, etc.)?
4. For confidence-based filtering, what threshold ranges have you found effective in practice?

Any guidance on the optimal strategy for utilizing multiple runs would be greatly appreciated!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] Ensemble approach using multiple runs from checkpoint files #19

Background

Current Approach

Questions

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Question] Ensemble approach using multiple runs from checkpoint files #19

Description

Background

Current Approach

Questions

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions