Data management: Definitions and example#
Learning Objectives#
After working through this topic, you should be able to:
Explain our definitions of data management and data cleaning
Recognise the key patterns in the example data that we will use in subsequent videos
Materials#
Video:
Download the slides.
Quiz#
from epp_topics.quiz_utilities import display_quiz
content = {
"What do we mean by data cleaning, as opposed to data management?": {
"Data cleaning is a strict subset of data management": True,
"Data management is a strict subset of data cleaning": False,
"Data cleaning and data management are the same thing": False,
"Data cleaning is only about handling missing values": False,
"Data cleaning is about bringing complicated data structures into a simple,"
" standardised form.": False,
"Data cleaning is all about the internal structure of variables (i.e., columns"
" of a DataFrame)": True,
"Data cleaning involves renaming variables, handling missing data, and setting"
" proper data types": True,
},
"What are key features of the example data?": {
"Numerical codes for missing values": True,
"Data types do not match the variable content": True,
"Missing values are encoded in the PyArrow-format": False,
"Categorical data types are used where appropriate": False,
"Using complex machine learning algorithms": False,
},
}
display_quiz(content)