Fundamentals of Data Engineering

Curriculum guideline

Effective Date:
Course
Discontinued
No
Course code
CSIS 3600
Descriptive
Fundamentals of Data Engineering
Department
Computing Studies & Information Systems
Faculty
Commerce & Business Administration
Credits
3.00
Start date
End term
Not Specified
PLAR
No
Semester length
15 Weeks
Max class size
35
Course designation
None
Industry designation
APICS
Contact hours

Lecture/Seminar: 4 hours/week

Method(s) of instruction
Lecture
Seminar
Learning activities

Lecture, seminars, demonstrations, and hands-on exercises/projects

Course description
This course covers data engineering concepts and the systems, processes, practices, and tools used in data engineering. Students will learn how to select and configure suitable data engineering infrastructures for various data engineering use cases, and to develop, implement, and manage pipelines for ingesting data from different data sources. Students will also learn how to securely provision ingested data to downstream data consumers. Throughout the course, contemporary data engineering tools will be used for hands-on class demonstrations, exercises, and projects.
Course content
  1. Introduction to data engineering
    • Data engineering principles
    • Data engineering lifecycle
  2. Data engineering infrastructure
    • Data infrastructure, including cloud infrastructure services, such as those
      provided by Amazon, Google, and Microsoft
    • Modern data architecture
    • Data infrastructure strategy
  3. Building data pipelines
    • Data pipeline patterns and types of data pipelines
    • Building batch data pipelines with tools such as Apache NiFi and Airflow
    • Building streaming data pipelines with tools such as Apache Kafka or Amazon Kinesis
    • Integrating batch and streaming data pipelines (i.e., mini-batch data streams)
  4. Managing data pipelines
    • Orchestrating data pipelines with orchestration tools, such as Apache Airflow
    • Handling changes in source systems and broken data pipelines
    • Monitoring and measuring pipeline performance
  5. Provisioning data for downstream data consumers
    • Cleaning and transforming data
    • Data validation
    • Serving data for downstream data consumers
    • Managing data security and privacy
    • Data governance
Learning outcomes

At the end of this course, the successful student will be able to:

• Explain data engineering concepts and processes.
• Depict and describe the data engineering life cycle.
• Evaluate trade-offs among data engineering techniques and design alternatives within the context of specific data engineering application domains.
•Select, install and/or configure suitable data engineering infrastructure (e.g., cloud infrastructure services, such as those provided by Amazon, Google, and Microsoft) and tools for various data engineering use cases.
• Build working data pipelines to ingest data from various sources.
• Manage data pipelines for optimal performance.
• Clean, transform, and validate messy ingested data.
• Securely make data available for downstream data consumers.

Means of assessment

Assessment will be based on course objectives and will be carried out in accordance with the Douglas College Evaluation Policy.

Labs

0-10%

Project(s)

15-25%

Midterm Examination*

30-35%

Final Examination*

30-40%

Total

100%

 Some of these assessments may involve group work.

 * Practical hands-on computer exam
 
In order to pass the course, students must, in addition to receiving an overall course grade of 50%, also achieve a grade of at least 50% on the combined weighted examination components (including quizzes, tests, exams).
Textbook materials
  • Reis, Joe and Housely Matt. Fundamentals of Data Engineering: Plan and Build Robust Data Systems, O’Reilly Media. Latest edition.
  • Crickard, Paul. Data Engineering with Python: Work with massive datasets to design data models and automate data pipelines using Python, Packt Publishing. Latest edition.
  • Custom courseware, class notes provided by the instructor, and online resources or other textbooks as approved by the department.
Prerequisites

Min grade C in CSIS 2175 and CSIS 2300

Note: CSIS 3560 and CSIS 4270 are recommended