Skip to content

Latest commit

 

History

History
28 lines (18 loc) · 1.33 KB

File metadata and controls

28 lines (18 loc) · 1.33 KB

Data With Python

Dataset Selection & Justification

Dataset: COVID-19 Global Data (Full Version)

Source: Our World in Data GitHub Repository – Dataset Link

Description:
This dataset contains country-level daily COVID-19 statistics, including cases, deaths, vaccinations, testing, and demographic/economic indicators. Key columns include:

  • location: Country or region name
  • date: Date of observation
  • total_cases: Cumulative confirmed COVID-19 cases
  • total_deaths: Cumulative deaths
  • people_vaccinated: Number of people vaccinated

Size: Approximately 430,000 rows and 67 columns.

Suitability & Relevance:

  • Real-world data with numeric, categorical, and date variables.
  • Contains missing values, outliers, and inconsistencies -> ideal for demonstrating data cleaning techniques.
  • Large enough to perform meaningful analysis but manageable for a Jupyter Notebook.
  • Relevant for public health, statistics, and data analysis assignments, making it easy to justify insights or visualizations.

⚠ Note: This GitHub version is no longer updated as of August 19, 2024. For the latest data, OWID provides updated CSVs through their data catalog.