Functional data cleaning: The Why#
Learning Objectives#
Our fundamental rules for data cleaning are:
Start with an empty DataFrame
Touch every variable just once
Touch with a pure function
After working through this topic, you should be able to explain how these rules help achieve the following five goals:
Code must be readable at different levels of abstraction
Pipeline always starts from the original data, which is never modified
Must be able to find out quickly what a variable contains
There must not be state in any column’s contents
Data management and analysis should be separated
Materials#
Video:
Download the slides.