🌍 Global Air Quality Dashboard – (Python-Power BI Project)

📘 Overview

This project delivers a comprehensive analysis and visualization of global air quality using Power BI and Python data preprocessing. It integrates raw pollutant data, data preprocessing scripts, SQL analysis, and interactive dashboards to deliver insights into global air pollution trends.

📂 Project Structure

├── data/
│   ├── global_air_pollution_data.csv    # Original raw data (pollutant readings, AQI, etc.)
│   └── clean_air_quality.xlsx           # Cleaned dataset ready for Power BI import
│
├── notebooks/
│   └── data_processing.ipynb            # Jupyter notebook for data cleaning and transformation
│
├── dashboard/
│   ├── Project.pbix                     # Power BI dashboard file
│   ├── overview.png                     # Screenshot of the Overview dashboard
│   └── pollutant.png                    # Screenshot of the Pollutant Impact dashboard
│
├── sql_scripts/
│   ├── air_pollutant_share_by_type.sql    
│   ├── countries_and_city_larger_zero.sql
│   ├── global_AQI_value_distribution.sql
│   └── pollutants_with_the_greatest_impact_on_global_average_AQI.sql
│
└── README.md                            # Project documentation

⚙️ Setup Environment

Before running the project, make sure you have the following installed:

Python (>=3.10 recommended) – Download
Jupyter Notebook:
- Option 1 (Recommend): Open the project folder in VS Code, then open the notebook data_processing.ipynb.
- Option 2 (Optional): Jupyter Notebook (web interface via CMD/Terminal)
  - Open cmd/terminal and run: pip install jupyter
  - After install move on folder project with: cd path/to/Air-Quality-Analysis
  - Run Notebook: jupyter notebook notebooks/data_processing.ipynb
  - Run each cell to preprocess by using (Shift + Enter)
Required Python libraries:
- All libraries are listed in requirements.txt, including
  - pandas
  - sqlalchemy
  - openpyxl
- They will be automatically installed when running the notebook
- Or you can install them manually using: pip install -r requirements.txt
Power BI Desktop (for dashboards) – Download
(Optional) PostgreSQL: Needed only if you want to run SQL scripts in sql_scripts/.

📝 How to Use

I. Terminal

Clone the repository: git clone https://github.com/KANH12/Air-Quality-Analysis.git
Navigate to the project directory: cd Air-Quality-Analysis
Check the raw data
- Ensure the file global_air_pollution_data.csv exists.
- This is the input dataset for all processing.

II. Open the Notebook

🗺️ Open the notebook file in one of the following ways:

Option A: Jupyter Notebook (Web Interface):
- jupyter notebook notebooks/data_processing.ipynb → This will open the notebook in your default web browser.
- Run each cell (Shift + Enter) to preprocess and clean the data.
Option B: Visual Studio Code
- Open the folder in VS Code
- Open notebooks/data_processing.ipynb
- Run the notebook using the “Run All” button or Shift + Enter per cell.

🧩 Data Sources

The dataset global_air_pollution_data.csv – includes pollutant concentration data (PM2.5, Ozone, NO₂, CO) and computed AQI for major global cities.
Data fields include:
- Country, City
- Pollutant
  - PM2.5 value, PM2.5 category
  - Ozone value, Ozone category
  - NO₂ value, NO₂ category
  - CO value, CO category
- AQI value and AQI category

⛮ Data Processing

Performed in data_processing.ipynb using Python libraries:

Data Cleaning
- Renamed columns to standardized and readable names.
- Check duplicate value column city
- Handled missing values by removing, particularly those with null Country fields
  - Records with null Country values were removed because, although other columns (including City) had data, each city appeared only once in the raw dataset. Without national reference data or repeated city entries, it was impossible to determine the corresponding country, so these records were excluded.
- Filtered out invalid or inconsistent data points to ensure data quality.
Data Transformation – No additional transformation was applied as each city record was unique.
Output

Export cleaned dataset (clean_air_quality.xlsx)
Loads the same dataset into PostgreSQL for SQL-based analysis.

🗄️ Database Integration (PostgreSQL)

The project integrates with PostgreSQL to execute analytical queries for deeper air quality exploration.

Folder sql_scripts/ contains queries for data exploration and analysis:

air_pollutant_share_by_type.sql → Compares pollutant proportions by type
countries_and_city_larger_zero.sql → Filters valid countries/cities
global_AQI_value_distribution.sql → Analyzes global AQI range distributions
pollutants_with_the_greatest_impact_on_global_average_AQI.sql → Identifies major pollution drivers

💡 All SQL scripts operate on the cleaned dataset loaded into PostgreSQL from the ETL pipeline.

⚙️ Data Pipeline Overview (ETLV)

This project follows a complete ETLV (Extract – Transform – Load – Visualize) workflow that connects multiple tools for end-to-end data analysis.

Extract
- Collected global_air_pollution_data.csv format from Global Air Quality dataset
- The dataset includes pollutant readings (PM2.5, NO₂, CO, O₃), AQI values, and geographic metadata.
Transform and Cleaning
- Cleaned and standardized raw data using Python (Pandas) in data_processing.ipynb.
- Task performed:
  - Handle missing values and rename columns
  - Filter invalid values (to avoid meaningless or corrupted data)
  - Prepare structured data for analysis
No further transformation was required since the dataset already contained all necessary columns.
Load
- Exported transformed data to:
  - clean_air_quality.xlsx → used in Power BI for visualization
  - PostgreSQL → used for intermediate SQL analysis (queries in /sql_scripts/)
Visualize
- Built interactive dashboards in Power BI using the cleaned dataset.
- Dashboards highlight trends, pollutant impacts, and geographic air quality differences.

🔄 Workflow Summary

                        Raw CSV 
                           ↓
                     Python (Cleaning)
                     ↓               ↓
[1] PostgreSQL (SQL Analysis)       [2] Excel (.xlsx)
                   ↓                    ↓
                  Power BI (Visualization)

📊 Power BI Dashboards

The project contains two interactive dashboards, designed for multi-dimensional analysis that highlights key air quality metrics.

1. Overview Dashboard

Purpose: Provide a global-level summary of air quality distribution.

Key Visuals:

Country & City & Status Filters: Dynamic filtering by geography and AQI status.
KPI Cards:
- Country count
- City count
- Average AQI
Area Chart: AQI distribution by value range.
Map Visualization: Global AQI levels by region.
Treemap: Distribution of AQI categories (Good, Moderate, Unhealthy, etc.).

2. Pollutant Impact Dashboard

Purpose: Analyze air quality by pollutant types and their relative contributions.

Key Visuals:

Country & City & Pollutants Filters: Dynamic filtering by geography and each pollutant.
KPI Cards:
- Countries and Cities recorded
- Average PM2.5, Ozone, NO₂, CO concentrations
- Active pollutants count
Pie Chart: Pollutant share by type.
Tree map: Block size and color indicate average concentration, highlighting the major contributors to air quality.

📦 Output Files

Excel: clean_air_quality.xlsx
SQL: sql_scripts/ runs on PostgreSQL
Power BI: dashboard.pbix

🌍 Key Insights

🟦 Key Insight 1 – Global Air Quality Stability

The global average AQI is 72.34, which falls within the Moderate range.
Most countries maintain relatively low AQI levels, indicating overall stable and acceptable air quality worldwide.

🟥 Key Insight 2 – AQI Distribution Patterns

Most countries have AQI values below 100, concentrated in the lower range.
Only a few countries exceed AQI 200, meaning severe pollution events are geographically limited rather than globally widespread.

🟨 Key Insight 3 – Pollutant Composition

PM2.5 dominates, contributing 63.9% of total air contamination.
Ozone (33.3%) is the second largest contributor.
NO₂ and CO have minor shares, showing that fine particulate matter and ozone are key global air quality concerns.

🟩 Key Insight 4 – Pollutant Severity

PM2.5 has the highest average concentration (68.88 µg/m³) — nearly double that of Ozone (35.23 µg/m³).
This highlights serious health risks from fine particles, especially in urban and industrial regions.

🟪 Key Insight 5 – Global Coverage & Data Scope

Dataset includes 175 countries and over 23,000 cities, ensuring broad global coverage.
Such scale enhances the reliability of insights on worldwide air quality trends.

→ Overall, while global air quality appears moderately stable, the dominance of PM2.5 and Ozone indicates that ongoing monitoring and pollution control remain essential to sustain healthy atmospheric conditions.

🛠️ Tools & Technologies

Category	Tools	Description
Visualization	Power BI	Data visualization and dashboard building
Programming	Python	Data preprocessing and scripting
Library	Pandas, NumPy	Data cleaning, manipulation, and analysis
Data Formats	Excel, CSV	Data storage and export formats
Query Language	SQL (PostgreSQL, MySQL)	Data querying and analysis

🔮 Future Enhancements

Integrate real-time air quality data from public APIs to enable live dashboard updates.
Automate the ETL process using Python scripts and schedule with Apache Airflow or Cron.
Deploy the dashboard on Power BI Service or Streamlit for public accessibility.

👨‍💻 Author

Le Nguyen Bao Khang [Khngzxz]

Data Analyst | Skilled in Python, SQL & Power BI

📧 baokhang1608@gmail.com
🔗 GitHub | Linkedln

📎 Notes

Screenshots in this documentation correspond to the Power BI dashboard views:

Overview Dashboard

Global Air Quality by Pollutant and Country
Ensure the file paths are correct when connecting clean_air_quality.xlsx to Power BI.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🌍 Global Air Quality Dashboard – (Python-Power BI Project)

📘 Overview

📂 Project Structure

⚙️ Setup Environment

📝 How to Use

🧩 Data Sources

⛮ Data Processing

🗄️ Database Integration (PostgreSQL)

⚙️ Data Pipeline Overview (ETLV)

🔄 Workflow Summary

📊 Power BI Dashboards

1. Overview Dashboard

2. Pollutant Impact Dashboard

📦 Output Files

🌍 Key Insights

🛠️ Tools & Technologies

🔮 Future Enhancements

👨‍💻 Author

📎 Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
dashboard		dashboard
data		data
notebooks		notebooks
sql_scripts		sql_scripts
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

🌍 Global Air Quality Dashboard – (Python-Power BI Project)

📘 Overview

📂 Project Structure

⚙️ Setup Environment

📝 How to Use

🧩 Data Sources

⛮ Data Processing

🗄️ Database Integration (PostgreSQL)

⚙️ Data Pipeline Overview (ETLV)

🔄 Workflow Summary

📊 Power BI Dashboards

1. Overview Dashboard

2. Pollutant Impact Dashboard

📦 Output Files

🌍 Key Insights

🛠️ Tools & Technologies

🔮 Future Enhancements

👨‍💻 Author

📎 Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages