SCORE cloud research reproducibility files

This repository contains simple supporting files for a paper on Sustainable Cloud Operations for Research (SCORE). The paper describes a practical framework for choosing cloud-based data ingestion approaches in research settings where cost, technical capacity, and sustainability matter.

The repository is intentionally small. It is meant to help a reader understand and reproduce the main evaluation steps, not to provide a production cloud deployment.

What is included

data/
  paper_results.csv              Main results reported in the paper
  dataset_manifest_template.csv  Template for listing source dataset files
docs/
  reproduction_notes.md          Plain-language notes on how the experiment was run
  gui_approaches.md              Notes for the GUI/no-code parts of the work
notebooks/
  01_code_ingestion_example.ipynb Example notebook for the Python/code approach
  02_summarise_results.ipynb      Simple summary of the reported results
scripts/
  summarise_results.py            Command-line version of the results summary
requirements.txt
LICENSE
CITATION.cff

Evaluation described in the paper

The paper compared three ways of ingesting the same large public dataset into cloud storage:

Synapse Pipelines: a no-code or GUI-based approach.
Synapse Notebook: a Python notebook/code-based approach.
Azure Functions: a serverless approach.

The comparison focused on:

pipeline cost,
execution time,
estimated carbon emissions,
storage/write costs, and
geo-replication transfer costs.

The reported values are stored in data/paper_results.csv.

Dataset

The paper used the NIH Chest X-ray dataset as a large public data ingestion workload. The raw dataset is not included in this repository because it is large and should be downloaded from the official source by each user.

Use data/dataset_manifest_template.csv as a simple template for recording the files used in a reproduction run.

How to use this repository

Install the small Python requirements:

pip install -r requirements.txt

Summarise the reported results:

python scripts/summarise_results.py

Open the notebooks:

notebooks/01_code_ingestion_example.ipynb
notebooks/02_summarise_results.ipynb

The first notebook shows the structure of the Python/code approach. It defaults to a dry-run style example and does not download or upload the full dataset.

Notes for reuse

Do not commit raw medical images or cloud credentials.
Keep cloud cost exports separate unless they have been reviewed for sharing.
If you rerun the experiment, record the cloud region, date, resource type, and any changes to the dataset or replication settings.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SCORE cloud research reproducibility files

What is included

Evaluation described in the paper

Dataset

How to use this repository

Notes for reuse

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
data		data
docs		docs
notebooks		notebooks
scripts		scripts
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

SCORE cloud research reproducibility files

What is included

Evaluation described in the paper

Dataset

How to use this repository

Notes for reuse

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages