Functional data cleaning: The Why

Functional data cleaning: The Why#

Learning Objectives#

Our fundamental rules for data cleaning are:

  1. Start with an empty DataFrame

  2. Touch every variable just once

  3. Touch with a pure function

After working through this topic, you should be able to explain how these rules help achieve the following five goals:

  1. Code must be readable at different levels of abstraction

  2. Pipeline always starts from the original data, which is never modified

  3. Must be able to find out quickly what a variable contains

  4. There must not be state in any column’s contents

  5. Data management and analysis should be separated

Materials#

Video:

Download the slides.

Quiz#