# Setting and renaming columns and indices

## Learning Objectives

After working through this topic, you should be able to:

- Explain why the index is important
- Rename columns in a DataFrame
- Set single and MultiIndices

## Materials

Video:

<iframe
  src="https://electure.uni-bonn.de/paella7/ui/watch.html?id=c251d939-af8f-41fc-98c2-542543caf849"
  width="640"
  height="360"
  frameborder="0"
  allowfullscreen
></iframe>

Download the [slides](pandas-columns_and_indices.pdf).


## Quiz

In [None]:
content = [
    {
        "question": "Using a meaningful index...",
        "type": "many_choice",
        "answers": [
            {
                "answer": "makes combining datasets safer.",
                "correct": True,
                "feedback": (
                    "You are less likely to introduce bugs due to misaligned data "
                    "because the hurdles for calling `.reset_index()` are higher."
                ),
            },
            {
                "answer": (
                    "means that you will never have duplicate values in the index."
                ),
                "correct": False,
                "feedback": (
                    "In a country x panel, setting the country name as the only index "
                    "would mean we have a meaningful index with duplicated values. "
                    "The index should be set to both country and year to avoid "
                    "duplicates."
                ),
            },
            {
                "answer": "means that we are never going to use floats. ",
                "correct": False,
                "feedback": (
                    "Using income as an index might be meaningful. But it is not a "
                    "good idea at all because of floating point imprecisions and the "
                    "fact that you will hardly be able to avoid duplicates. "
                ),
            },
            {
                "answer": ("means that we will not end up with MultiIndex objects."),
                "correct": False,
                "feedback": (
                    "Not at all. MultiIndex objects come up as soon as you have more "
                    "than one dimension in the rows (e.g., panel data)."
                ),
            },
        ],
    },
    {
        "question": "What will the output of the following code be?",
        "code": """>>> df.columns
Index(['a', 'b', 'c'], dtype='string')

>>> df = df.rename(columns={"b": "B", "x": "u"})
>>> df.columns
""",
        "type": "multiple_choice",
        "answers": [
            {
                "code": "Index(['a', 'B', 'c'], dtype='string')",
                "correct": True,
            },
            {
                "code": "KeyError: 'x'",
                "correct": False,
                "feedback": (
                    "The rename method will only consider the subset of keys in the "
                    "dictionary that are also present among the columns."
                ),
            },
            {
                "code": (
                    "KeyError: 'a' and 'c' not found among dictionary keys {'b', 'x'}"
                ),
                "correct": False,
                "feedback": (
                    "The rename method does not require that all columns are present "
                    "among the dictionary keys."
                ),
            },
            {
                "code": ("Index(['a', 'B', 'c', 'u'], dtype='string')"),
                "correct": False,
                "feedback": (
                    "How would you expect this to work? For example, how would you "
                    "expect the column 'u' be filled? This would lead to memory "
                    "problems when you accidentally pass a large dictionary when `df` "
                    "has many rows."
                ),
            },
        ],
    },
]

from jupyterquiz import display_quiz

display_quiz(content, colors="fdsp")