Data management: Definitions and example

Data management: Definitions and example#

Learning Objectives#

After working through this topic, you should be able to:

  • Explain our definitions of data management and data cleaning

  • Recognise the key patterns in the example data that we will use in subsequent videos

Materials#

Video:

Download the slides.

Quiz#

from epp_topics.quiz_utilities import display_quiz

content = {
    "What do we mean by data cleaning, as opposed to data management?": {
        "Data cleaning is a strict subset of data management": True,
        "Data management is a strict subset of data cleaning": False,
        "Data cleaning and data management are the same thing": False,
        "Data cleaning is only about handling missing values": False,
        "Data cleaning is about bringing complicated data structures into a simple,"
        " standardised form.": False,
        "Data cleaning is all about the internal structure of variables (i.e., columns"
        " of a DataFrame)": True,
        "Data cleaning involves renaming variables, handling missing data, and setting"
        " proper data types": True,
    },
    "What are key features of the example data?": {
        "Numerical codes for missing values": True,
        "Data types do not match the variable content": True,
        "Missing values are encoded in the PyArrow-format": False,
        "Categorical data types are used where appropriate": False,
        "Using complex machine learning algorithms": False,
    },
}

display_quiz(content)