worked on tidying from wider to longer wording and removed loading da…

…ta from the canlang package as I don't think we want to add that complexity...
UBC-DSCI · ttimbers · Sep 27, 2021 · Jun 27, 2021 · Jun 28, 2021 · Jun 28, 2021
commit 55516393525d3f41d91c7a3c2e7ef6b1904a4422
diff --git a/data/region_data.csv b/data/region_data.csv
@@ -0,0 +1,36 @@
+region,households,area,population,dwellings
+Belleville,43002,1354.65121,103472,45050
+Lethbridge,45696,3046.69699,117394,48317
+Thunder Bay,52545,2618.26318,121621,57146
+Peterborough,50533,1636.98336,121721,55662
+Saint John,52872,3793.42158,126202,58398
+Brantford,52530,1086.27106,134203,54419
+Moncton,61769,2625.1211,144810,66699
+Guelph,59280,604.00365,151984,63324
+Trois-Rivières,72502,1052.80206,156042,77734
+Saguenay,72479,3078.79919,160980,77968
+Kingston,67915,2142.32855,161175,77173
+Greater Sudbury,70445,4372.1229,164689,76619
+Abbotsford - Mission,62631,651.99511,180518,65967
+Kelowna,81383,3144.90019,194882,88374
+Barrie,72534,967.67675,197059,76336
+St. John's,85015,850.46041,205955,92353
+Sherbrooke,95577,1506.36002,212105,106082
+Regina,94955,4408.86418,236481,101719
+Saskatoon,115283,6218.50503,295095,124766
+Windsor,132912,1032.38176,329144,140408
+Victoria,162716,704.4339,367770,172559
+Oshawa,138962,908.06142,379848,142462
+Halifax,173459,5963.13705,403390,187478
+St. Catharines - Niagara,168485,1425.34399,406074,180606
+London,206448,2677.86088,494069,220452
+Kitchener - Cambridge - Waterloo,200495,1106.65072,523894,210896
+Hamilton,293345,1404.6567,747545,306034
+Winnipeg,306550,5410.82907,778489,321484
+Québec,361891,3475.38576,800296,382308
+Edmonton,502143,9857.77908,1321426,537634
+Ottawa - Gatineau,535499,7168.96442,1323783,571146
+Calgary,519693,5241.70103,1392609,544870
+Vancouver,960894,3040.41532,2463431,1027613
+Montréal,1727310,4638.24059,4098927,1823281
+Toronto,2135909,6269.93132,5928040,2235145
diff --git a/docs/_main_files/figure-html/02-plot-1.png b/docs/_main_files/figure-html/02-plot-1.png
diff --git a/docs/_main_files/figure-html/test-1.png b/docs/_main_files/figure-html/test-1.png
diff --git a/docs/img/dataframe/dataframe.001.jpeg b/docs/img/dataframe/dataframe.001.jpeg
diff --git a/docs/img/obs_and_var/obs_and_var.001.jpeg b/docs/img/obs_and_var/obs_and_var.001.jpeg
diff --git a/docs/img/pivot_longer_with_table.jpeg b/docs/img/pivot_longer_with_table.jpeg
diff --git a/docs/img/separate_function.jpeg b/docs/img/separate_function.jpeg
diff --git a/docs/img/vec_vs_list/vec_vs_list.001.jpeg b/docs/img/vec_vs_list/vec_vs_list.001.jpeg
diff --git a/docs/img/vector/vector.001.jpeg b/docs/img/vector/vector.001.jpeg
diff --git a/docs/reference-keys.txt b/docs/reference-keys.txt
@@ -432,3 +432,14 @@ r-and-the-irkernel
 r-packages
 latex
 moving-files-to-your-computer
+fig:img-separate
+tab:summary-functions-table
+what-is-a-list
+separate
+using-select-helpers-to-extract-columns
+filter-and
+aggregating-data-with-group_by-summarize
+iterating-over-columns-of-a-data-frame
+using-summarize-and-across-to-iterate
+iterating-over-rows-in-a-data-frame-with-rowwise
+going-from-wide-to-long-using-pivot_longer
diff --git a/docs/search_index.json b/docs/search_index.json
diff --git a/docs/wrangling.html b/docs/wrangling.html
diff --git a/wrangling.Rmd b/wrangling.Rmd
@@ -105,7 +105,7 @@ You can create vectors in R using the concatenate `c()` function. To create the
 vector `region` as shown in Figure \@ref(fig:02-vector) we write:
 
 ``` {r}
-year <- c("Toronto", "Montreak", "Vancouver", "Calgary", "Ottawa")
+year <- c("Toronto", "Montreal", "Vancouver", "Calgary", "Ottawa")
 year
 ```
 
@@ -221,7 +221,7 @@ Tidy data satisfy the following three criteria [@wickham2014tidy]:
   - each row is a single observation,
   - each column is a single variable, and
   - each value is a single cell (i.e., its row and column position in the data
-    frame is not shared with another value)
+    frame) is not shared with another value.
 
 In Figure \@ref(fig:02-tidy-image), we have a tidy data set that satisfies these 
 three criteria. 
@@ -242,44 +242,65 @@ upfront. Luckily there are many well-designed `tidyverse` data
 cleaning/wrangling tools to help you easily tidy your data. Let's explore them
 below!
 
-### Going from wide to long (or tidy!) using `pivot_longer`
+### Going from wide to long using `pivot_longer`
 
-One common step to get data into a tidy format is to combine columns that are 
-stored in separate columns but are really part of the same variable.
-Data is often stored this way because this format is usually more intuitive for
+One task that is commonly performed to get data into a tidy format 
+is to combine values that are stored in separate columns, 
+but are really part of the same variable, into one.
+Data is often stored this way because this format is sometimes more intuitive for
 human readability and understanding, and humans create data sets.
+In Figure \@ref(fig:02-wide-to-long), 
+the table on the left is in an untidy, "wide" format because the year values 
+(2006, 2011, 2016) are stored as column names. 
+And as a consequence, 
+the values for population for the various cities 
+over these years are also split across several columns. 
+For humans, this table is easy to read, which is why you will often
+find data stored in this wide format. 
+However, this format is difficult to work with 
+when performing data visualization 
+or statistical analysis using R.
+
+For example, if we wanted to
+find the latest year it would be challenging because the 
+year values are stored as column names instead of as values in a single column.
+So before we could apply a function to find the latest year 
+(for example, by using `max`), 
+we would have to first first extract the column names to get them as a vector
+and then apply a function to extract the latest year.
+The problem only gets worse if you would like to find the value for the 
+population for a given region for the latest year.
+Both of these tasks are greatly simplified once the data is tidied.
 
-For example, in Figure \@ref(fig:02-wide-to-long), the table on the left is in an
-untidy, "wide" format because the year values (2006, 2011, 2016) are listed as
-the column headers. For humans, this table is easy to read, which is why you will often
-find data stored in this wide format. However, for R, to do any visualization or
-analysis this format is difficult to work with. For example, if we wanted to
-find the maximum year it's hard to do when the year values are not in their own
-column (since R often applies functions, such as `max` column-wise). 
 Another problem with data in this format is that we don't know what the
 numbers under each year actually represent. Do those numbers represent
-population size? Land area? It's not clear. We can reshape this data set to a
-"long" format by creating a column called "year" and a column called
+population size? Land area? It's not clear. 
+To solve both of these problems, 
+we can reshape this data set to a tidy data format 
+by creating a column called "year" and a column called
 "population," which is the table on the right of Figure \@ref(fig:02-wide-to-long).
+Note that this transformation makes the data "longer".
 
 ``` {r 02-wide-to-long, echo = FALSE, message = FALSE, warning = FALSE, fig.cap = "Going from wide to long data", fig.retina = 2, out.width = "1150"}
 knitr::include_graphics("img/wide_to_long.jpeg")
 ```
 
-The function `pivot_longer` combines columns, and often makes the data frame longer
-and narrower. To learn how to use `pivot_longer`, we will work with the
+The function `pivot_longer` combines columns, 
+and is usually used during tidying data 
+when we need to make the data frame longer and narrower. 
+To learn how to use `pivot_longer`, we will work through an example with the
 `region_lang_top5_cities_wide.csv` data set. This data set contains contains the
 counts of how many Canadians cited each language as their mother tongue for five
 major Canadian cities (Toronto, Montréal, Vancouver, Calgary and Edmonton) from
-the 2016 Canadian census. We will load the `tidyverse` package so we can use our
-wrangling functions and the `canlang` package since it contains the
-`region_lang` and `region_data` data sets that we will use later in the chapter.
+the 2016 Canadian census. 
+To get started, 
+we will load the `tidyverse` package so we can access our data reading 
+and wrangling functions in R.
 
 Our data set is stored in an untidy format, as shown below:
 
 ``` {r 02-tidyverse, warning=FALSE, message=FALSE}
 library(tidyverse)
-library(canlang)
 lang_wide <- read_csv("data/region_lang_top5_cities_wide.csv")
 lang_wide
 ```
@@ -647,13 +668,17 @@ filter(official_langs, region == "Calgary" | region == "Edmonton")
 
 ### Using `filter` to extract rows with `%in%`
 
-Suppose we want to see the populations of our five cities. The `region_data`
-data set from the `canlang` package contains statistics for number of
-households, land area, population and number of dwellings for different regions
-according to the 2016 Canadian census.
+Suppose we want to see the populations of our five cities. Let's read in the
+`region_data.csv` file that comes from the 2016 Canadian census, 
+as it contains statistics for number of households, land area, population 
+and number of dwellings for different regions.
 
-``` {r}
-region_data
+```{r, include = FALSE}
+write_csv(canlang::region_data, "data/region_data.csv")
+```
+
+``` {r message = FALSE}
+region_data <- read_csv("data/region_data.csv")
 ```
 
 To get the population of our five cities we can filter the data set using the