Related to #99
Hi, I don't think LABLoader is behaved as expected. If I call it directly, e.g.
loader = LABLoader(path='../only_words/labs/dev/{uri}.lab')
It works smoothly.
But when I use it with a protocol pipeline, with database.yml as following:
Protocols
AMI:
SpeakerDiarization:
only_words:
train:
uri: ../lists/train.meetings.txt
annotation: ../only_words/rttms/train/{uri}.rttm
annotated: ../uems/train/{uri}.uem
lab: ../only_words/labs/train/{uri}.lab
development:
uri: ../lists/dev.meetings.txt
annotation: ../only_words/rttms/dev/{uri}.rttm
annotated: ../uems/dev/{uri}.uem
lab: ../only_words/labs/dev/{uri}.lab
and something like
for file in only_words.development():
lab=file['lab']
It will throw the error ValueError: path must contain the {uri} placeholder.
By debugging, I notice that in LABLoader, you explicitly check if the path contains the {uri} placeholder.
|
_, placeholders, _, _ = zip(*string.Formatter().parse(self.path)) |
|
self.placeholders_ = set(placeholders) - set([None]) |
|
if "uri" not in self.placeholders_: |
|
raise ValueError("`path` must contain the {uri} placeholder.") |
While in load function in Template class, this loader is called after the resolve_path.
|
def load(current_file: ProtocolFile): |
|
path = resolve_path(Path(template.format(**abs(current_file))), database_yml) |
|
|
|
# check if file exists |
|
if not path.is_file(): |
|
msg = f"No such file or directory: '{path}' (via '{template}' template)." |
|
raise FileNotFoundError(msg) |
|
|
|
loader = Loader(path) |
|
return loader(current_file) |
Hence, when the LABLoader is called during the protocol pipeline, the placeholder in the template has already been resolved to certain uri. And when you check over the resolved path, it will always raise the error of no placeholder.
So, I'm doubting the necessity of doing this sanity check in the LABLoader function, since in gather_loader function, you have already done such sanity check.
|
# check whether value (path) contains placeholders such as {uri} or {subset} |
|
_, placeholders, _, _ = zip(*string.Formatter().parse(value)) |
|
is_template = len(set(placeholders) - set([None])) > 0 |
Unless you are assuming a use case that the LABLoader is called solely, outside the protocol pipeline. In that case I think it better to simply use load_lab.
I expect your opinion.
Related to #99
Hi, I don't think
LABLoaderis behaved as expected. If I call it directly, e.g.loader = LABLoader(path='../only_words/labs/dev/{uri}.lab')It works smoothly.
But when I use it with a protocol pipeline, with
database.ymlas following:and something like
It will throw the error
ValueError: path must contain the {uri} placeholder.By debugging, I notice that in
LABLoader, you explicitly check if the path contains the{uri}placeholder.pyannote-database/pyannote/database/loader.py
Lines 258 to 261 in 6816228
While in
loadfunction inTemplateclass, this loader is called after theresolve_path.pyannote-database/pyannote/database/custom.py
Lines 105 to 114 in 6816228
Hence, when the LABLoader is called during the protocol pipeline, the placeholder in the template has already been resolved to certain uri. And when you check over the resolved path, it will always raise the error of no placeholder.
So, I'm doubting the necessity of doing this sanity check in the LABLoader function, since in
gather_loaderfunction, you have already done such sanity check.pyannote-database/pyannote/database/custom.py
Lines 225 to 227 in 6816228
Unless you are assuming a use case that the LABLoader is called solely, outside the protocol pipeline. In that case I think it better to simply use
load_lab.I expect your opinion.