Skip to content
This repository was archived by the owner on Oct 5, 2023. It is now read-only.
This repository was archived by the owner on Oct 5, 2023. It is now read-only.

Metadata: More flexible specification of metadata type #873

@neubig

Description

@neubig

Currently, string-based meta-data is treated as nominal if there are 20 or fewer examples, and text if there are 21 or more examples:

zeno/zeno/util.py

Lines 46 to 47 in 808f4b2

if len(unique) < 21:
return MetadataType.NOMINAL

This can be limiting. For example, I have a use case where I want to run slice finder on clusters found from a text clustering algorithm. As-is, this means that I am limited to 20 or fewer clusters, which is probably not granular enough to make these clusters meaningful.

One possible design would be if distill functions could (optionally) specify the metadata type like this:

return DistillReturn(distill_output=document_clusters, metadata_type=MetadataType.NOMINAL)

If no type is specified we could fall back to the current behavior (but that also should potentially be documented somewhere).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions