Skip to content

feat: Add dataset store for Novel metrics#1

Draft
Ramlaoui wants to merge 3 commits intomainfrom
feat/dataset-store
Draft

feat: Add dataset store for Novel metrics#1
Ramlaoui wants to merge 3 commits intomainfrom
feat/dataset-store

Conversation

@Ramlaoui
Copy link
Copy Markdown
Collaborator

@Ramlaoui Ramlaoui commented May 6, 2025

This adds an example for how to create a dataset store using a reference dataset (here we use LeMat-Bulk). A novelty metric is then implemented based on that.

The goal is to expand on this novelty metric to have a better chosen store and equivalence checker and then refine the metric.

@Ramlaoui
Copy link
Copy Markdown
Collaborator Author

HOLD: This will be useful when we'll want to change the method for the fingerprint computation method, but for now, it is more efficient to extract the computed fingerprint from LeMat-Bulk in Hugging Face.

The idea is that this is sometimes very tedious to compute efficiently and swapping methods is not straightforward because it requires modifying the metrics themselves etc. By using a fingerprint store and a fingerprint extraction abstraction, we avoid going through that route.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant