Memory usage linearly increasing while iterating on tfds datasets

Hi! I am trying to load and play with the dgs corpus dataset.
Now I load it and it downloads them locally. Then try to loop trough them but even if I just sleep in the first iteration of the loop or I do not do anything the memory usage linearly increase.
This is the code I am using

```
config = sign_language_datasets.datasets.dgs_corpus.DgsCorpusConfig(
        name="holistic_m", include_video=False, include_pose="holistic"
    )
dgs_corpus = tfds.load(name="dgs_corpus", builder_kwargs=dict(config=config

    with tf.io.TFRecordWriter('data.tfrecord') as writer:
        for datum in dgs_corpus["train"]:
            time.sleep(3000)

```

and you can see from the memory profile output 

![memory1](https://user-images.githubusercontent.com/4303399/205112126-8fadfa03-94f6-4155-ba50-ab48aaad0ce8.png)


The final goal is to either save them in npy format or load them in PyTorch because that is what our pipeline currently accept.

Any helps or pointers would be great!

Thanks



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory usage linearly increasing while iterating on tfds datasets #6

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Memory usage linearly increasing while iterating on tfds datasets #6

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions