Skip to content

UZIMA-DS/SCORE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SCORE cloud research reproducibility files

This repository contains simple supporting files for a paper on Sustainable Cloud Operations for Research (SCORE). The paper describes a practical framework for choosing cloud-based data ingestion approaches in research settings where cost, technical capacity, and sustainability matter.

The repository is intentionally small. It is meant to help a reader understand and reproduce the main evaluation steps, not to provide a production cloud deployment.

What is included

data/
  paper_results.csv              Main results reported in the paper
  dataset_manifest_template.csv  Template for listing source dataset files
docs/
  reproduction_notes.md          Plain-language notes on how the experiment was run
  gui_approaches.md              Notes for the GUI/no-code parts of the work
notebooks/
  01_code_ingestion_example.ipynb Example notebook for the Python/code approach
  02_summarise_results.ipynb      Simple summary of the reported results
scripts/
  summarise_results.py            Command-line version of the results summary
requirements.txt
LICENSE
CITATION.cff

Evaluation described in the paper

The paper compared three ways of ingesting the same large public dataset into cloud storage:

  1. Synapse Pipelines: a no-code or GUI-based approach.
  2. Synapse Notebook: a Python notebook/code-based approach.
  3. Azure Functions: a serverless approach.

The comparison focused on:

  • pipeline cost,
  • execution time,
  • estimated carbon emissions,
  • storage/write costs, and
  • geo-replication transfer costs.

The reported values are stored in data/paper_results.csv.

Dataset

The paper used the NIH Chest X-ray dataset as a large public data ingestion workload. The raw dataset is not included in this repository because it is large and should be downloaded from the official source by each user.

Use data/dataset_manifest_template.csv as a simple template for recording the files used in a reproduction run.

How to use this repository

Install the small Python requirements:

pip install -r requirements.txt

Summarise the reported results:

python scripts/summarise_results.py

Open the notebooks:

notebooks/01_code_ingestion_example.ipynb
notebooks/02_summarise_results.ipynb

The first notebook shows the structure of the Python/code approach. It defaults to a dry-run style example and does not download or upload the full dataset.

Notes for reuse

  • Do not commit raw medical images or cloud credentials.
  • Keep cloud cost exports separate unless they have been reviewed for sharing.
  • If you rerun the experiment, record the cloud region, date, resource type, and any changes to the dataset or replication settings.

About

Supporting materials for a research paper on Sustainable Cloud Operations for Research (SCORE), including simplified reproduction notes, example notebooks, and reported results from a cloud data ingestion evaluation.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors