Skip to content

StanfordDataScience/dssg_covidcast

Repository files navigation

DSSG Covidcast: Forecasting Aids for Delphi

Introduction

This GitHub repository was completed by a team of fellows as part of Stanford's Data Science for Social Good program in the Summer of 2021, working on a project originated by DELPHI research group at Carnegie Mellon University. The end goal of our project is to develop tools for comparing and evaluating COVID-19 forecasters for deaths, cases, and hospitalizations in the United States, aiming to help epidemiological researchers gain insights into the performance of their forecasts and lead to more accurate forecasting of epidemics in the future. You can read our final report in this directory and view the slides from our final presentation here.

Our outputs are:

  1. An interactive parameterized report that evaluates and compares the performance (Average Error, Mean Weighted Interval Score, 80% Coverage) of several COVID-19 forecasters for cases, deaths, and hospitalizations.
  2. Templated R Markdown files for each outcome of interest (cases, deaths, hospitalization), along with auxiliary R scripts for manipulating markdowns and generating reports.

Through the evaluation report, users can assess their forecaster’s performance compared to others, assess new forecasters before they are deployed, and identify periods of time or areas of improvement they need to focus on.

Getting Started

Each report contains all of the scripts and information necessary to evaluate forecasts on their choice of the outcome of interest. After inputting a set of parameters (forecasters, weeks ahead, color-blind safe palettes for the plots), the template yields a comprehensive report on the predictions of COVID forecasters as well as their performance compared to the ground truth. The visualizations generated by the template offer an intuitive way to compare the accuracy of forecasters across all US states.

The user can customize the following parameters:

  • Forecasters: A multiple choice option to choose specific forecasters to compare to.
  • Weeks: The number of epi-weeks ahead that the forecasts are made (1, 2, 3, or 4). The default is all four choices.
  • Colorblind_palette: Whether to use a colorblind-safe palette for generating the plots. The default is to use a colorblind-safe palette.

Code is split through several files, each of which has a more detailed description below:

  • cases_template.Rmd: A template evaluating the performance of COVID-19 forecasters for predictions of COVID-19 cases in the U.S.

  • deaths_template.Rmd: A template evaluating the performance of COVID-19 forecasters for predictions of COVID-19 deaths in the U.S.

  • hospitalization_template.Rmd: A template evaluating the performance of COVID-19 forecasters for predictions of COVID-19 hospitalizations in the U.S.

  • helper_functions.R: A function to change the available choices for the markdown parameters (e.g., include all available forecasters).

  • Example reports: A directory containing pre-knitted html files of each template, as examples.

Outcome of Interest & Data Sources

We examined three main outcomes of interest for the evaluation report.

  • Covid-19 Cases: This data shows the number of COVID-19 confirmed cases newly reported each day. It reflects only cases reported by state and local health authorities. It is based on case counts compiled and made public by a team at Johns Hopkins University (for Puerto Rico) and by USAFacts (all other locations). Note that the signal may not be directly comparable across regions with vastly different testing capacity or reporting criteria.

  • Covid-19 Deaths: This data shows the number of COVID-19 related deaths newly reported each day. It reflects official figures reported by state and local health authorities, and may not include excess deaths not confirmed by health authorities to be due to COVID-19 . The signal is based on death counts compiled and made public by a team at Johns Hopkins University (for Puerto Rico) and by USAFacts (all other locations). Some signals are based on population (ex: Cumulative number of confirmed COVID-19 cases per 100,000 population)

  • Covid-19 Hospitalizations: Delphi receives from their health system partners aggregated statistics on COVID-related hospital admissions, derived from ICD codes found in insurance claims. Using this data, Delphi estimates the percentage of new hospital admissions each day that are related to COVID-19. Note that these estimates are based only on admissions by patients whose data is accessible to our partners, and address new hospital admissions each day, not all currently hospitalized patients who have COVID-related diagnoses.

More information on data sources can be found on the COVIDcast Epidata API website

Download Data

To promote the flexibility to replicate the report, the data used in this report can be easily downloaded as a CSV file using the ‘download’ button in the report. By doing so, the user can generate customized plots or even include their own forecaster.

Fellows & Mentors

  • Taha Bouhoun recently graduated with a Bachelor's in Computational Sciences from Minerva University.
  • Michelle Lee recently graduated with a Master’s in Public Health (Population Health & Biostatistics) from Columbia University and can be reached at mjl2241@columbia.edu.
  • Shilaan Alzahawi is a PhD student in Organizational Behavior at Stanford University Graduate School of Business. Shilaan was the technical mentor for this project.
  • Balasubramanian Narasimhan (Stanford University) and Daniel McDonald (University of British Columbia, Canada) were the faculty mentors on this project.

Acknowledgements

We would like to thank DELPHI Research Group for making this project possible and the Stanford Data Science Institute for organizing and operating the Stanford Data Science for Social Good program. The fellows learned to adapt and be flexible throughout the course of the project. It has been a very exciting and one-of-a-kind learning experience for us.

An enormous amount of credit and thanks is due to our mentors, Shilaan (technical mentor), Naras (faculty mentor), and Daniel (faculty mentor) for their continuous support, enthusiasm, and immense knowledge throughout this project. We are also grateful to the post-doctoral and graduate mentors supporting the DSSG program for their insightful advice, continuous support, and passion for social impact: Armin Thomas, Faidra Monachou, Lijing Wang, Qian Zhao, and Kiran Shiragur. Lastly, we would like to thank the Summer 2021 Fellows for their support and laughs: Daniel Chen, Evelyn Fitzgerald, Gabriel Agostini, Juan Langlois, Riya Berry, and TingYan Deng.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages