Skip to content

Access to simulated data #58

@zhangqi3552

Description

@zhangqi3552

Hello,

Yes, it is a known issue.

The simulated data is not available since the data was transferred to HuggingFace. We are working on this. Sorry for the inconvenience!

Igor


From: luoj21 @.***>

Sent: Wednesday, February 19, 2025 8:50 PM

To: microsoft/NOTSOFAR1-Challenge @.***>

Cc: Subscribed @.***>

Subject: [microsoft/NOTSOFAR1-Challenge] Access to Simulated Data (Issue #57)

Hello,

Do we still have access to the full simulated training data (both 200hr and 1000hr variants)? When I run:

train_set_path = download_simulated_subset( version=ver, volume='200hrs', subset_name='train', destination_dir=os.path.join(project_dir, 'train'))

I get:

RuntimeError: Failed to list files in directory css-datasets/v1.5/200hrs/train in the Hugging Face repository: Failed to list directory css-datasets/v1.5/200hrs/train in the Hugging Face repository: 404 Client Error. (Request ID: Root=1-67b6b17c-33b2cee92c21e8c75237416e;a44d6604-d391-4b11-ae74-0a543a8c90f2)

Entry Not Found for url: https://huggingface.co/api/datasets/microsoft/NOTSOFAR/tree/main/css-datasets%2Fv1.5%2F200hrs%2Ftrain?recursive=True&expand=False. css-datasets/v1.5/200hrs/train does not exist on "main"

Reply to this email directly, view it on GitHub#57, or unsubscribehttps://github.com/notifications/unsubscribe-auth/A62UUF6TRJ57ZMMIS7KEX232QVNJRAVCNFSM6AAAAABXPZ5W6SVHI2DSMVQWIX3LMV43ASLTON2WKOZSHA3DKMBXGA3TGOI.

You are receiving this because you are subscribed to this thread.Message ID: @.***>

[luoj21]luoj21 created an issue (#57)#57

Hello,

Do we still have access to the full simulated training data (both 200hr and 1000hr variants)? When I run:

train_set_path = download_simulated_subset( version=ver, volume='200hrs', subset_name='train', destination_dir=os.path.join(project_dir, 'train'))

I get:

RuntimeError: Failed to list files in directory css-datasets/v1.5/200hrs/train in the Hugging Face repository: Failed to list directory css-datasets/v1.5/200hrs/train in the Hugging Face repository: 404 Client Error. (Request ID: Root=1-67b6b17c-33b2cee92c21e8c75237416e;a44d6604-d391-4b11-ae74-0a543a8c90f2)

Entry Not Found for url: https://huggingface.co/api/datasets/microsoft/NOTSOFAR/tree/main/css-datasets%2Fv1.5%2F200hrs%2Ftrain?recursive=True&expand=False. css-datasets/v1.5/200hrs/train does not exist on "main"

Reply to this email directly, view it on GitHub#57, or unsubscribehttps://github.com/notifications/unsubscribe-auth/A62UUF6TRJ57ZMMIS7KEX232QVNJRAVCNFSM6AAAAABXPZ5W6SVHI2DSMVQWIX3LMV43ASLTON2WKOZSHA3DKMBXGA3TGOI.

You are receiving this because you are subscribed to this thread.Message ID: @.***>

Originally posted by @igor0304 in #57

Hello, is the simulated data available on the HuggingFace? There is no css-datasets on
https://huggingface.co/datasets/microsoft/NOTSOFAR/tree/main

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions