Course Overview:
This course explores the challenges of designing, building and maintaining data processing pipelines. Focusing on concepts, techniques and technologies for gathering, validating, transporting, transforming, enhancing, storing, integrating and maintaining diverse data sets common to modern enterprises
Objectives:
- Understanding how distributed data pipelines are designed and implemented
- Analyze ethical issues related to gathering, processing and storing of data
- Identify and use common best practices for gathering and validating data
- Develop software to check and maintain the validity and quality of data
- Explain how software should be designed for transport of data in a distributed system
- Design and implement data transformation software
- Develop data enhancement modules using appropriate technologies
- Recognize opportunities for integration of diverse data sets
- Consider diverse technologies for data storage and maintenance
Project:
Build, develop, test and monitor a small-scale data pipeline