feat: add 3D mesh support and MeshFolder builder#8055
Conversation
A Test Conducted:from datasets import Features, Value, Sequence, Image, Audio, Mesh, load_dataset
# Define features.
features = Features({
'id': Value('string'),
'objaverse_uid': Value('string'),
'text': Value('string'),
'image': Image(),
'audio': Audio(),
'mesh': Mesh(), # NEW automatically handles struct<bytes, path>
'metadata': {
'image_score': Value('double'),
'audio_score': Value('double'),
'tags': Sequence(Value('string'))
}
})
# Load a Parquet.
dataset = load_dataset(
"parquet",
data_files={"train": "train-00001.parquet"},
features=features,
streaming=True
)["train"]
# Push.
dataset.push_to_hub("VINAY-UMRETHE/Vividha-Test")This can be viewed at VINAY-UMRETHE/Vividha-test Although the dataset viewer does not show anything since the site is not configured to show Mesh with a rendered image (heavy) or a simple placeholder icon. Up to devs. |
|
@lhoestq review |
|
Looks really cool ! is there a python lib that can be used to load the data instead of returning bytes/path ? and sorry for the delay ! |
Yes, I updated:
|
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
| ```py | ||
| >>> from datasets import Dataset, Features, Mesh | ||
|
|
||
| >>> dataset = Dataset.from_dict({"mesh": ["path/to/model.glb"]}, features=Features({"mesh": Mesh()})) |
There was a problem hiding this comment.
let's update the docs once with a cool mesh dataset on HF, do ou have an idea ?
There was a problem hiding this comment.
I've done a test which you can now find at VINAY-UMRETHE/My-Mesh-Dataset dataset repo which used Mesh() feature
However, while testing I noticed a error with embed_external_files which is fixed now but pending a merge, Created at #8224
Before you merge that, we can update the docs in that PR as well, this would finalize the whole Mesh-Support
Commits:
This PR introduces 3D mesh support to the
datasetslibrary, mirroring the existing paradigms for Image, Audio, and Video modalities. this is made to support 3D data just like image, audio, etc...new
Meshfeature class, which manages 3D data via a PyArrowstructcontaining both raw bytes and file paths. support is intentionally focused on self-contained binary formats like GLB, PLY, and STL (since they seem sweetspot to me because others like.obj .gltfrequires external sub files).new
MeshFolderbuilder module. This packaged module enables users to load datasets directly from structured or unstructured directories of mesh files. implementation has been integrated into library's core, including registration in the main features module and support for 3D data within `WebDataset``Tests Were conducted using some new files too.
TESTS CONDUCTED :
Output:
some test files were added too in
tests/features/datafolder like I saw for other modalites.