Skip to content

KANH12/Air-Quality-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

36 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐ŸŒ Global Air Quality Dashboard โ€“ (Python-Power BI Project)

Python Power BI License Status

๐Ÿ“˜ Overview

This project delivers a comprehensive analysis and visualization of global air quality using Power BI and Python data preprocessing. It integrates raw pollutant data, data preprocessing scripts, SQL analysis, and interactive dashboards to deliver insights into global air pollution trends.


๐Ÿ“‚ Project Structure

โ”œโ”€โ”€ data/
โ”‚   โ”œโ”€โ”€ global_air_pollution_data.csv    # Original raw data (pollutant readings, AQI, etc.)
โ”‚   โ””โ”€โ”€ clean_air_quality.xlsx           # Cleaned dataset ready for Power BI import
โ”‚
โ”œโ”€โ”€ notebooks/
โ”‚   โ””โ”€โ”€ data_processing.ipynb            # Jupyter notebook for data cleaning and transformation
โ”‚
โ”œโ”€โ”€ dashboard/
โ”‚   โ”œโ”€โ”€ Project.pbix                     # Power BI dashboard file
โ”‚   โ”œโ”€โ”€ overview.png                     # Screenshot of the Overview dashboard
โ”‚   โ””โ”€โ”€ pollutant.png                    # Screenshot of the Pollutant Impact dashboard
โ”‚
โ”œโ”€โ”€ sql_scripts/
โ”‚   โ”œโ”€โ”€ air_pollutant_share_by_type.sql    
โ”‚   โ”œโ”€โ”€ countries_and_city_larger_zero.sql
โ”‚   โ”œโ”€โ”€ global_AQI_value_distribution.sql
โ”‚   โ””โ”€โ”€ pollutants_with_the_greatest_impact_on_global_average_AQI.sql
โ”‚
โ””โ”€โ”€ README.md                            # Project documentation

โš™๏ธ Setup Environment

Before running the project, make sure you have the following installed:

  • Python (>=3.10 recommended) โ€“ Download

  • Jupyter Notebook:

    • Option 1 (Recommend): Open the project folder in VS Code, then open the notebook data_processing.ipynb.

    • Option 2 (Optional): Jupyter Notebook (web interface via CMD/Terminal)

      • Open cmd/terminal and run: pip install jupyter
      • After install move on folder project with: cd path/to/Air-Quality-Analysis
      • Run Notebook: jupyter notebook notebooks/data_processing.ipynb
      • Run each cell to preprocess by using (Shift + Enter)
  • Required Python libraries:

    • All libraries are listed in requirements.txt, including
      • pandas
      • sqlalchemy
      • openpyxl
    • They will be automatically installed when running the notebook
    • Or you can install them manually using: pip install -r requirements.txt
  • Power BI Desktop (for dashboards) โ€“ Download

  • (Optional) PostgreSQL: Needed only if you want to run SQL scripts in sql_scripts/.


๐Ÿ“ How to Use

I. Terminal

  1. Clone the repository: git clone https://github.com/KANH12/Air-Quality-Analysis.git

  2. Navigate to the project directory: cd Air-Quality-Analysis

  3. Check the raw data

II. Open the Notebook

๐Ÿ—บ๏ธ Open the notebook file in one of the following ways:

  • Option A: Jupyter Notebook (Web Interface):

    • jupyter notebook notebooks/data_processing.ipynb โ†’ This will open the notebook in your default web browser.
    • Run each cell (Shift + Enter) to preprocess and clean the data.
  • Option B: Visual Studio Code

    • Open the folder in VS Code
    • Open notebooks/data_processing.ipynb
    • Run the notebook using the โ€œRun Allโ€ button or Shift + Enter per cell.


๐Ÿงฉ Data Sources

  • The dataset global_air_pollution_data.csv โ€“ includes pollutant concentration data (PM2.5, Ozone, NOโ‚‚, CO) and computed AQI for major global cities.

  • Data fields include:

    • Country, City

    • Pollutant

      • PM2.5 value, PM2.5 category
      • Ozone value, Ozone category
      • NOโ‚‚ value, NOโ‚‚ category
      • CO value, CO category
    • AQI value and AQI category


โ›ฎ Data Processing

Performed in data_processing.ipynb using Python libraries:

  1. Data Cleaning
    • Renamed columns to standardized and readable names.

    • Check duplicate value column city

    • Handled missing values by removing, particularly those with null Country fields

      • Records with null Country values were removed because, although other columns (including City) had data, each city appeared only once in the raw dataset. Without national reference data or repeated city entries, it was impossible to determine the corresponding country, so these records were excluded.
    • Filtered out invalid or inconsistent data points to ensure data quality.

  2. Data Transformation โ€“ No additional transformation was applied as each city record was unique.
  3. Output
  • Export cleaned dataset (clean_air_quality.xlsx)
  • Loads the same dataset into PostgreSQL for SQL-based analysis.

๐Ÿ—„๏ธ Database Integration (PostgreSQL)

The project integrates with PostgreSQL to execute analytical queries for deeper air quality exploration.

Folder sql_scripts/ contains queries for data exploration and analysis:

๐Ÿ’ก All SQL scripts operate on the cleaned dataset loaded into PostgreSQL from the ETL pipeline.


โš™๏ธ Data Pipeline Overview (ETLV)

This project follows a complete ETLV (Extract โ€“ Transform โ€“ Load โ€“ Visualize) workflow that connects multiple tools for end-to-end data analysis.

  1. Extract

  2. Transform and Cleaning

    • Cleaned and standardized raw data using Python (Pandas) in data_processing.ipynb.

    • Task performed:

      • Handle missing values and rename columns
      • Filter invalid values (to avoid meaningless or corrupted data)
      • Prepare structured data for analysis

    No further transformation was required since the dataset already contained all necessary columns.

  3. Load

    • Exported transformed data to:
  4. Visualize

    • Built interactive dashboards in Power BI using the cleaned dataset.
    • Dashboards highlight trends, pollutant impacts, and geographic air quality differences.

๐Ÿ”„ Workflow Summary

                        Raw CSV 
                           โ†“
                     Python (Cleaning)
                     โ†“               โ†“
[1] PostgreSQL (SQL Analysis)       [2] Excel (.xlsx)
                   โ†“                    โ†“
                  Power BI (Visualization)

๐Ÿ“Š Power BI Dashboards

The project contains two interactive dashboards, designed for multi-dimensional analysis that highlights key air quality metrics.

1. Overview Dashboard

Dashboard Overview Purpose: Provide a global-level summary of air quality distribution.

Key Visuals:

  • Country & City & Status Filters: Dynamic filtering by geography and AQI status.

  • KPI Cards:

    • Country count
    • City count
    • Average AQI
  • Area Chart: AQI distribution by value range.

  • Map Visualization: Global AQI levels by region.

  • Treemap: Distribution of AQI categories (Good, Moderate, Unhealthy, etc.).


2. Pollutant Impact Dashboard

Pollutant Impact Dashboard Purpose: Analyze air quality by pollutant types and their relative contributions.

Key Visuals:

  • Country & City & Pollutants Filters: Dynamic filtering by geography and each pollutant.

  • KPI Cards:

    • Countries and Cities recorded
    • Average PM2.5, Ozone, NOโ‚‚, CO concentrations
    • Active pollutants count
  • Pie Chart: Pollutant share by type.

  • Tree map: Block size and color indicate average concentration, highlighting the major contributors to air quality.


๐Ÿ“ฆ Output Files


๐ŸŒ Key Insights

๐ŸŸฆ Key Insight 1 โ€“ Global Air Quality Stability

  • The global average AQI is 72.34, which falls within the Moderate range.
  • Most countries maintain relatively low AQI levels, indicating overall stable and acceptable air quality worldwide.

๐ŸŸฅ Key Insight 2 โ€“ AQI Distribution Patterns

  • Most countries have AQI values below 100, concentrated in the lower range.
  • Only a few countries exceed AQI 200, meaning severe pollution events are geographically limited rather than globally widespread.

๐ŸŸจ Key Insight 3 โ€“ Pollutant Composition

  • PM2.5 dominates, contributing 63.9% of total air contamination.
  • Ozone (33.3%) is the second largest contributor.
  • NOโ‚‚ and CO have minor shares, showing that fine particulate matter and ozone are key global air quality concerns.

๐ŸŸฉ Key Insight 4 โ€“ Pollutant Severity

  • PM2.5 has the highest average concentration (68.88 ยตg/mยณ) โ€” nearly double that of Ozone (35.23 ยตg/mยณ).
  • This highlights serious health risks from fine particles, especially in urban and industrial regions.

๐ŸŸช Key Insight 5 โ€“ Global Coverage & Data Scope

  • Dataset includes 175 countries and over 23,000 cities, ensuring broad global coverage.
  • Such scale enhances the reliability of insights on worldwide air quality trends.

โ†’ Overall, while global air quality appears moderately stable, the dominance of PM2.5 and Ozone indicates that ongoing monitoring and pollution control remain essential to sustain healthy atmospheric conditions.


๐Ÿ› ๏ธ Tools & Technologies

Category Tools Description
Visualization Power BI Data visualization and dashboard building
Programming Python Data preprocessing and scripting
Library Pandas, NumPy Data cleaning, manipulation, and analysis
Data Formats Excel, CSV Data storage and export formats
Query Language SQL (PostgreSQL, MySQL) Data querying and analysis

๐Ÿ”ฎ Future Enhancements

  • Integrate real-time air quality data from public APIs to enable live dashboard updates.
  • Automate the ETL process using Python scripts and schedule with Apache Airflow or Cron.
  • Deploy the dashboard on Power BI Service or Streamlit for public accessibility.

๐Ÿ‘จโ€๐Ÿ’ป Author

Le Nguyen Bao Khang [Khngzxz]

Data Analyst | Skilled in Python, SQL & Power BI


๐Ÿ“Ž Notes

  • Screenshots in this documentation correspond to the Power BI dashboard views:

    Overview Dashboard Dashboard Overview

    Global Air Quality by Pollutant and Country Pollutant Impact Dashboard

  • Ensure the file paths are correct when connecting clean_air_quality.xlsx to Power BI.


ยฉ 2025 Le Nguyen Bao Khang โ€“ All rights reserved

About

๐ŸŒ Data Analyst Project โ€“ Cleaning, Analyzing, and Visualizing Global Air Quality (AQI) Data using Python, SQL, and Power BI

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors