feat: fragmented random sampler#212
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #212 +/- ##
==========================================
- Coverage 93.33% 91.86% -1.48%
==========================================
Files 14 15 +1
Lines 1111 1193 +82
==========================================
+ Hits 1037 1096 +59
- Misses 74 97 +23
🚀 New features to boost your workflow:
|
|
This is an interesting design! It allows us to reuse the this What happens here when we have a lot of categories for perfect random sampling? We have a lot of classes and a Also, could you highlight why not make |
Hmm interesting. We can't actually put FragmentedRandomSampler because its shuffled and with replacement atm but a FragmentedSampler could work if thats what you mean. We can do all these but I'd like to know what you are ok with regarding the DistributedSampler. Do you think we should extend this |
|
About the performance: yeah I agree if we fixed |
The idea is to be able to sample from a list of masks. When categorical sampler is implemented each category will have a FragmantedRandomSampler inside and will process their splits and chunks outputs to provide us categorical sampling.
I wrote
_fragmented_random_sampler.pymyself but not the unit tests. I will write them soon.One thing needs assessment is the mask attribute in the
SamplerAPI.FragmentedRandomSamplerinheritsSamplerbut can't supportmask.DistributedSamplerthinks allSamplers have amaskproperty.I would just go with
MaskSamplerfor our old assumptions which takesmask: sliceand let DistributedSampler accept that. Only the signature would change here so I am not sure if it would be a breaking change from the DistributedSampler side. IfDistributedSamplerdidn't exist I could've just silently generalizeSampler.mask: list[slice] | sliceand let the child classes overload the mask withChunkSampler.mask: slice.