Lecture/Seminar: 4 hours/week
Lecture, seminars, demonstrations, and hands-on exercises/projects
- Introduction to data engineering
- Data engineering principles
- Data engineering lifecycle
- Data engineering infrastructure
- Data infrastructure, including cloud infrastructure services, such as those
provided by Amazon, Google, and Microsoft - Modern data architecture
- Data infrastructure strategy
- Building data pipelines
- Data pipeline patterns and types of data pipelines
- Building batch data pipelines with tools such as Apache NiFi and Airflow
- Building streaming data pipelines with tools such as Apache Kafka or Amazon Kinesis
- Integrating batch and streaming data pipelines (i.e., mini-batch data streams)
- Managing data pipelines
- Orchestrating data pipelines with orchestration tools, such as Apache Airflow
- Handling changes in source systems and broken data pipelines
- Monitoring and measuring pipeline performance
- Provisioning data for downstream data consumers
- Cleaning and transforming data
- Data validation
- Serving data for downstream data consumers
- Managing data security and privacy
- Data governance
At the end of this course, the successful student will be able to:
• Explain data engineering concepts and processes.
• Depict and describe the data engineering life cycle.
• Evaluate trade-offs among data engineering techniques and design alternatives within the context of specific data engineering application domains.
•Select, install and/or configure suitable data engineering infrastructure (e.g., cloud infrastructure services, such as those provided by Amazon, Google, and Microsoft) and tools for various data engineering use cases.
• Build working data pipelines to ingest data from various sources.
• Manage data pipelines for optimal performance.
• Clean, transform, and validate messy ingested data.
• Securely make data available for downstream data consumers.
Assessment will be based on course objectives and will be carried out in accordance with the Douglas College Evaluation Policy.
Labs |
0-10% |
Project(s) |
15-25% |
Midterm Examination* |
30-35% |
Final Examination* |
30-40% |
Total |
100% |
Some of these assessments may involve group work.
- Reis, Joe and Housely Matt. Fundamentals of Data Engineering: Plan and Build Robust Data Systems, O’Reilly Media. Latest edition.
- Crickard, Paul. Data Engineering with Python: Work with massive datasets to design data models and automate data pipelines using Python, Packt Publishing. Latest edition.
- Custom courseware, class notes provided by the instructor, and online resources or other textbooks as approved by the department.