Skip to content

Necessity of placeholder sanity check inside LABLoader #102

@alephpi

Description

@alephpi

Related to #99

Hi, I don't think LABLoader is behaved as expected. If I call it directly, e.g.
loader = LABLoader(path='../only_words/labs/dev/{uri}.lab')
It works smoothly.

But when I use it with a protocol pipeline, with database.yml as following:

Protocols
  AMI:
    SpeakerDiarization:
      only_words:
        train:
            uri: ../lists/train.meetings.txt
            annotation: ../only_words/rttms/train/{uri}.rttm
            annotated: ../uems/train/{uri}.uem
            lab: ../only_words/labs/train/{uri}.lab
        development:
            uri: ../lists/dev.meetings.txt
            annotation: ../only_words/rttms/dev/{uri}.rttm
            annotated: ../uems/dev/{uri}.uem
            lab: ../only_words/labs/dev/{uri}.lab

and something like

for file in only_words.development():
    lab=file['lab']

It will throw the error ValueError: path must contain the {uri} placeholder.

By debugging, I notice that in LABLoader, you explicitly check if the path contains the {uri} placeholder.

_, placeholders, _, _ = zip(*string.Formatter().parse(self.path))
self.placeholders_ = set(placeholders) - set([None])
if "uri" not in self.placeholders_:
raise ValueError("`path` must contain the {uri} placeholder.")

While in load function in Template class, this loader is called after the resolve_path.

def load(current_file: ProtocolFile):
path = resolve_path(Path(template.format(**abs(current_file))), database_yml)
# check if file exists
if not path.is_file():
msg = f"No such file or directory: '{path}' (via '{template}' template)."
raise FileNotFoundError(msg)
loader = Loader(path)
return loader(current_file)

Hence, when the LABLoader is called during the protocol pipeline, the placeholder in the template has already been resolved to certain uri. And when you check over the resolved path, it will always raise the error of no placeholder.

So, I'm doubting the necessity of doing this sanity check in the LABLoader function, since in gather_loader function, you have already done such sanity check.

# check whether value (path) contains placeholders such as {uri} or {subset}
_, placeholders, _, _ = zip(*string.Formatter().parse(value))
is_template = len(set(placeholders) - set([None])) > 0

Unless you are assuming a use case that the LABLoader is called solely, outside the protocol pipeline. In that case I think it better to simply use load_lab.

I expect your opinion.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions