Imperative data cleaning#
There is no need to go through this if you have not been doing data cleaning in the past. It is just to connect your prior experience to what we will suggest in the “functional data cleaning” section.
Learning Objectives#
After working through this topic, you should be able to:
Explain what we mean by imperative data cleaning
Relate our explanation of imperative data cleaning to your own experience
Recognize common patterns in imperative data management code
Identify problems with the imperative approach
Materials#
Video:
Download the slides.
Quiz#
from epp_topics.quiz_utilities import display_quiz
content = {
"What are characteristics of imperative data cleaning?": {
"Variables are modified in place repeatedly": True,
"Each variable is created exactly once": False,
"The code uses many functions": False,
"Helper variables clutter the namespace": True,
"The DataFrame remains unchanged throughout": False,
},
"Which patterns are common in imperative data management code?": {
"Using for loops to process multiple similar variables": True,
"Comments to structure the code": True,
"Each step creates a new DataFrame": False,
"Pure functions that return new data": False,
"Reusing the same variable names for different states": True,
},
"What are problems with the imperative approach to data cleaning?": {
"Invalid intermediate states of the DataFrame": True,
"Difficult to reuse parts of the code": True,
"Hard to execute code in the wrong order": False,
"Code is too fast": False,
"No way to test individual operations easily": True,
},
}
display_quiz(content)