Dataset Reuse Indicators

This repo provides data and subsequent model for the paper:

Koesten, Laura and Vougiouklis, Pavlos and Simperl, Elena and Groth, Paul, Dataset Reuse: Translating Principles to Practice. Available at SSRN: https://ssrn.com/abstract=3589836 or http://dx.doi.org/10.2139/ssrn.3589836

This includes:

Data for all github repos containing datasets (download_github_dataset.sh).
- This contains a python pickle file. It is a list of each repo. A repo is represented as a hash that contains metadata about each repo containing a dataset. To unpickle use:
```
file = open("dataset.pickle", 'rb')
obj = pickle.load(file, encoding='latin1')
obj[0] # gets the first repo in the list
```
Data used for training models (download_processed_datasets.sh).
- Code to work with this data is in the source code directory.
The source code for model training (under reuse_predictor). Our model uses pytorch.

The shell scripts above are designed to make the data easy to access with this repo. The data can also be downloaded from Zenodo:

Koesten, Laura, Vougiouklis, Pavlos, Groth, Paul, & Simperl, Elena. (2020). Dataset Reuse Indicators Datasets (Version 1.0) [Data set]. Zenodo. http://doi.org/10.5281/zenodo.4015954

For more information contact: Laura Koesten

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
reuse_predictor		reuse_predictor
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
download_github_dataset.sh		download_github_dataset.sh
download_processed_datasets.sh		download_processed_datasets.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dataset Reuse Indicators

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Dataset Reuse Indicators

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages