Data-Engineering-Project

Project Description

This project is splitted into four milestones where the aim was to create an ETL pipeline for a fintech dataset with the following main objectives:

Data Cleaning, Transformation & Feature Engineering using Pandas and PySpark
Loading Data into Postgress Database
Stream Processing using Kafka & Zookeeper
Visualize Data in Dashboard and create DAG in Airflow

Milestone 1

Objective:

Perform exploratory data analysis (EDA) with visualization and extract additional data.
Perform data cleaning by tidying column names, handle inconsistent data, missing data and outliers.
Introduce new features, encoding and normalization.
Create a lookup table where values in the lookup table can be later used to reverse all of the imputed values to their original values.

Diagram

Milestone 2

Objective:

Utilize Docker to create a container that performs the tasks implemented in Milestone 1.
Save the clean dataset in Postgres Database.
Receive a data stream using Kafka & Zookeeper and process the message then save it to the database.

Diagram

Milestone 3

Objective:

This milestone focus on getting hands-on experience with PySpark by implementing the following:

Loading the dataset
Perform some simple cleaning
- Column renaming
- Detect missing
- Handle missing
- Check missing
Perform some analysis on the dataset
Add new columns with feature engineering
Encode categorical columns
Create a lookup table for encoding only
Saving Cleaned dataseta and lookup table
Saving the output into a postgres database

Milestone 4

Objective:

Create an ETL pipeline using Airflow
Creating a dashboard for the output data where the aim to give insights on the following 5 questions: - What is the trend of loan issuance over the months for each year? - What is the percentage distribution of loan grades in the dataset? - What is the distribution of loan amounts across different grades? - Which states have the highest average loan amount? - How does the loan amount relate to annual income across states?

Diagram

Video

Showcase.mp4

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Images		Images
Milestone 1		Milestone 1
Milestone 2		Milestone 2
Milestone 3		Milestone 3
Milestone 4		Milestone 4
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data-Engineering-Project

Project Description

Milestone 1

Objective:

Diagram

Milestone 2

Objective:

Diagram

Milestone 3

Objective:

Milestone 4

Objective:

Diagram

Video

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Data-Engineering-Project

Project Description

Milestone 1

Objective:

Diagram

Milestone 2

Objective:

Diagram

Milestone 3

Objective:

Milestone 4

Objective:

Diagram

Video

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages