Fundamentals of Data Engineering
Overview
- Introduction to data engineering
- Data engineering principles
- Data engineering lifecycle
- Data engineering infrastructure
- Data infrastructure, including cloud infrastructure services, such as those
provided by Amazon, Google, and Microsoft - Modern data architecture
- Data infrastructure strategy
- Building data pipelines
- Data pipeline patterns and types of data pipelines
- Building batch data pipelines with tools such as Apache NiFi and Airflow
- Building streaming data pipelines with tools such as Apache Kafka or Amazon Kinesis
- Integrating batch and streaming data pipelines (i.e., mini-batch data streams)
- Managing data pipelines
- Orchestrating data pipelines with orchestration tools, such as Apache Airflow
- Handling changes in source systems and broken data pipelines
- Monitoring and measuring pipeline performance
- Provisioning data for downstream data consumers
- Cleaning and transforming data
- Data validation
- Serving data for downstream data consumers
- Managing data security and privacy
- Data governance
Lecture, seminars, demonstrations, and hands-on exercises/projects
Assessment will be based on course objectives and will be carried out in accordance with the Douglas College Evaluation Policy.
Labs |
0-10% |
Project(s) |
15-25% |
Midterm Examination* |
30-35% |
Final Examination* |
30-40% |
Total |
100% |
Some of these assessments may involve group work.
At the end of this course, the successful student will be able to:
• Explain data engineering concepts and processes.
• Depict and describe the data engineering life cycle.
• Evaluate trade-offs among data engineering techniques and design alternatives within the context of specific data engineering application domains.
•Select, install and/or configure suitable data engineering infrastructure (e.g., cloud infrastructure services, such as those provided by Amazon, Google, and Microsoft) and tools for various data engineering use cases.
• Build working data pipelines to ingest data from various sources.
• Manage data pipelines for optimal performance.
• Clean, transform, and validate messy ingested data.
• Securely make data available for downstream data consumers.
- Reis, Joe and Housely Matt. Fundamentals of Data Engineering: Plan and Build Robust Data Systems, O’Reilly Media. Latest edition.
- Crickard, Paul. Data Engineering with Python: Work with massive datasets to design data models and automate data pipelines using Python, Packt Publishing. Latest edition.
- Custom courseware, class notes provided by the instructor, and online resources or other textbooks as approved by the department.
Requisites
Course Guidelines
Course Guidelines for previous years are viewable by selecting the version desired. If you took this course and do not see a listing for the starting semester / year of the course, consider the previous version as the applicable version.
Course Transfers
These are for current course guidelines only. For a full list of archived courses please see https://www.bctransferguide.ca
Institution | Transfer details for CSIS 3600 | |
---|---|---|
There are no applicable transfer credits for this course. |