Project: Data Warehouse

The goal of this project is to build an ETL pipeline for a database hosted on Amazon Redshift. The pipeline loads data from Amazon S3 into staging tables on Redshift and executes SQL statements to create analytics tables from these staging tables.

Project Steps

Step 1: Queries Creation

In this step, we create all required SQL queries in the sql_queries.py file. These queries include:

Creating Fact and Dimension tables
Loading data from S3 into staging tables
Inserting data into final tables from staging tables
Retrieving data ready for analysis

Step 2: Tables Creation

In this step, we write a Python script called create_tables.py to execute table creation queries.

Step 3: Data Loading

In this step, we write a Python script called etl.py to load data from Amazon S3 to staging tables on Redshift and insert loaded data into final tables.

Step 4: Retrieve Data

In this step, we write a Python script called analysis.py to retrieve data from final tables for analysis.

Step 5: Solution Running

In this step, we write a Jupyter Notebook called project_exe.ipynb that helps execute all the previous scripts step by step.

How to Run the Python Scripts

Ensure that all required Python packages are installed (e.g., psycopg2, pandas, boto3).
Configure the dwh.cfg file with your Amazon Redshift cluster and AWS credentials.
Run the create_tables.py script to create the necessary tables in Redshift.
Run the etl.py script to load data from S3 into staging tables and insert it into final tables.
Run the analysis.py script to retrieve data from final tables for analysis.

Files in the Repository

sql_queries.py: Contains SQL queries for creating tables, loading data, and inserting data into final tables.
create_tables.py: Python script to create tables in Redshift.
etl.py: Python script to load data from S3 and insert it into final tables.
analysis.py: Python script to retrieve data from final tables for analysis.
project_exe.ipynb: Jupyter Notebook to execute the scripts step by step.
dwh.cfg: Configuration file containing Redshift cluster and AWS credentials.
README.md: Documentation explaining the project, how to run the scripts, and the files in the repository.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project: Data Warehouse

Project Steps

Step 1: Queries Creation

Step 2: Tables Creation

Step 3: Data Loading

Step 4: Retrieve Data

Step 5: Solution Running

How to Run the Python Scripts

Files in the Repository

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
LICENSE		LICENSE
README.md		README.md
analysis.py		analysis.py
create_tables.py		create_tables.py
dwh.cfg		dwh.cfg
etl.py		etl.py
project_exe.ipynb		project_exe.ipynb
sql_queries.py		sql_queries.py

Folders and files

Latest commit

History

Repository files navigation

Project: Data Warehouse

Project Steps

Step 1: Queries Creation

Step 2: Tables Creation

Step 3: Data Loading

Step 4: Retrieve Data

Step 5: Solution Running

How to Run the Python Scripts

Files in the Repository

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages