adding why wide is bad

UBC-DSCI · ttimbers · Sep 27, 2021 · Jun 27, 2021 · Jun 28, 2021 · Jun 28, 2021
commit 8c4e76c1b2b72ee988aea37a0b006dc135b18c69
diff --git a/02-wrangling.Rmd b/02-wrangling.Rmd
@@ -265,7 +265,7 @@ lang_long <- read_csv("data/region_lang_top5_cities_long.csv")
 lang_long
 ```
 
-What is wrong with this format above? In this example, each observation should be a language in a region. However, in the messy data set above, each observation is split across multiple rows. One where the count for `most_at_home` is recorded and the other where the count for `most_at_work` is recorded. Suppose we wanted to visualize the relationship between the number of Canadians reporting their primary language at home and work. It would be difficult to do that with the data in its current format.
+What is wrong with this format above? In this example, each observation should be a language in a region. However, in the messy data set above, each observation is split across multiple rows. One where the count for `most_at_home` is recorded and the other where the count for `most_at_work` is recorded. Suppose we wanted to visualize the relationship between the number of Canadians reporting their primary language at home and work. It would be difficult to do that with the data in its current format since these two variables are stored in the same column. 
 
 We can see how we would like to transform the data from long to wide with `pivot_wider` in Figure \@ref(fig:img-pivot-wider-table).  
 ```{r img-pivot-wider-table, echo = FALSE, message = FALSE, warning = FALSE, fig.cap = "Going from long to wide with pivot_wider", out.width="1100", fig.retina = 2}
@@ -781,7 +781,7 @@ useful functions, we recommend you checkout [this chapter](http://stat545.com/bl
 ## Using `purrr`'s `map*` functions to iterate
 
 Where should you turn when you discover the next step in your data wrangling/cleaning process requires you to apply a function to 
-each column in a data frame? For example, if you wanted to know the maximum value of each column in a data frame? Well, you could use `summarize` 
+each column in a data frame? The process of repeating a function on different columns or different data sets is called **iteration**. For example, suppose you wanted to know the maximum value of each column in a data frame. You could use `summarize` 
 as discussed above. However, this becomes inconvenient when you have many columns, as `summarize` requires you to type out a column name and a data 
 transformation for each summary statistic you want to calculate.